Python: How can I include the delimiter(s) in a st

2020-02-07 00:12发布

I would like to split a string, with multiple delimiters, but keep the delimiters in the resulting list. I think this is a useful thing to do an an initial step of parsing any kind of formula, and I suspect there is a nice Python solution.

Someone asked a similar question in Java here.

For example, a typical split looks like this:

>>> s='(twoplusthree)plusfour'
>>> s.split(f, 'plus')
['(two', 'three)', 'four']

But I'm looking for a nice way to add the plus back in (or retain it):

['(two', 'plus', 'three)', 'plus', 'four']

Ultimately I'd like to do this for each operator and bracket, so if there's a way to get

['(', 'two', 'plus', 'three', ')', 'plus', 'four']

all in one go, then all the better.

5条回答
Explosion°爆炸
2楼-- · 2020-02-07 00:43

Here is an easy way using re.split:

import re

s = '(twoplusthree)plusfour'
re.split('(plus)',  s)

Output:

['(two', 'plus', 'three)', 'plus', 'four']

re.split is very similar to string.split except that instead of a literal delimiter you pass a regex pattern. The trick here is to put () around the pattern so it gets extracted as a group.

Bear in mind that you'll have empty strings if there are two consecutive occurrencies of the delimiter pattern

查看更多
兄弟一词,经得起流年.
3楼-- · 2020-02-07 00:44
import re
s = '(twoplusthree)plusfour'
l = re.split(r"(plus|\(|\))", s)
a = [x for x in l if x != '']
print a

output:

['(', 'two', 'plus', 'three', ')', 'plus', 'four']
查看更多
仙女界的扛把子
4楼-- · 2020-02-07 00:54

Here i'm spliting a string on first occurance of alpha characters:

def split_on_first_alpha(i):
    #i="3.5 This is one of the way"
    split_1=re.split(r'[a-z]',i,maxsplit=1, flags=re.IGNORECASE)
    find_starting=re.findall(r'[a-z]',i,flags=re.IGNORECASE)
    split_1[1]=find_starting[0]+split_1[1]
    return split_1
查看更多
做自己的国王
5楼-- · 2020-02-07 00:55

You can do that with Python's re module.

import re
s='(twoplusthree)plusfour'
list(filter(None, re.split(r"(plus|[()])", s)))

You can leave out the list if you only need an iterator.

查看更多
三岁会撩人
6楼-- · 2020-02-07 01:00

this thread is old, but since its top google result i thought of adding this:

if you dont want to use regex there is a simpler way to do it. basically just call split, but put back the separator except on the last token

def split_keep_deli(string_to_split, deli):
    result_list = []
    tokens = string_to_split.split(deli)
    for i in xrange(len(tokens) - 1):
        result_list.append(tokens[i] + deli)
    result_list.append(tokens[len(tokens)-1])
    return  result_list
查看更多
登录 后发表回答