How do I remove a substring from the end of a stri

2019-01-02 19:45发布

问题:

I have the following code:

url = 'abcdc.com'
print(url.strip('.com'))

I expected: abcdc

I got: abcd

Now I do

url.rsplit('.com', 1)

Is there a better way?

回答1:

strip doesn't mean "remove this substring". x.strip(y) treats y as a set of characters and strips any characters in that set from the ends of x.

Instead, you could use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

Or using regular expressions:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)


回答2:

If you are sure that the string only appears at the end, then the simplest way would be to use 'replace':

url = 'abcdc.com'
print url.replace('.com','')


回答3:

def strip_end(text, suffix):
    if not text.endswith(suffix):
        return text
    return text[:len(text)-len(suffix)]


回答4:

Since it seems like nobody has pointed this on out yet:

url = "www.example.com"
new_url = url[:url.rfind(".")]

This should be more efficient than the methods using split() as no new list object is created, and this solution works for strings with several dots.



回答5:

Depends on what you know about your url and exactly what you're tryinh to do. If you know that it will always end in '.com' (or '.net' or '.org') then

 url=url[:-4]

is the quickest solution. If it's a more general URLs then you're probably better of looking into the urlparse library that comes with python.

If you on the other hand you simply want to remove everything after the final '.' in a string then

url.rsplit('.',1)[0]

will work. Or if you want just want everything up to the first '.' then try

url.split('.',1)[0]


回答6:

In one line:

text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]


回答7:

How about url[:-4]?



回答8:

If you know it's an extension, then

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

This works equally well with abcdc.com or www.abcdc.com or abcdc.[anything] and is more extensible.



回答9:

For urls (as it seems to be a part of the topic by the given example), one can do something like this:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

Both will output: ('http://www.stackoverflow', '.com')

This can also be combined with str.endswith(suffix) if you need to just split ".com", or anything specific.



回答10:

url.rsplit('.com', 1)

is not quite right.

What you actually would need to write is

url.rsplit('.com', 1)[0]

, and it looks pretty succinct IMHO.

However, my personal preference is this option because it uses only one parameter:

url.rpartition('.com')[0]


回答11:

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

I want to repeat this answer as the most expressive way to do it. Of course, the following would take less CPU time

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

However, if CPU is the bottle neck why write in Python?

When is CPU a bottle neck anyway?? in drivers , maybe.

The advantages of using regular expression is code reusability. What if you next want to remove '.me' , which only has three characters?

Same code would do the trick.

>>> rm_sub('abcdc.me','.me')
'abcdc'


回答12:

This is a perfect use for regular expressions:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'


回答13:

Or you can use split:

a = 'abccomputer.com'
res = a.split('.com',1)[0]


回答14:

def remove_file_type(infile):
import re
return(re.sub('\.[^.]*$','',infile))
remove_file_type('abc.efg')'abc'


回答15:

In my case I needed to raise an exception so I did:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]


回答16:

If you mean to strip only extension

url = 'abcdc.com'
print('.'.join(url.split('.')[:-1]))

It works with any extension, with potential other dots existing in filename as well. It simply splits string to list on dots and joins it without last element.

Probably not the fastest, but for me it's more readable than other methods.



标签: