Grab a line's whitespace/indention with Python

2020-02-01 03:29发布

Basically, if I have a line of text which starts with indention, what's the best way to grab that indention and put it into a variable in Python? For example, if the line is:

\t\tthis line has two tabs of indention

Then it would return '\t\t'. Or, if the line was:

    this line has four spaces of indention

Then it would return four spaces.

So I guess you could say that I just need to strip everything from a string from first non-whitespace character to the end. Thoughts?

6条回答
孤傲高冷的网名
2楼-- · 2020-02-01 04:04

A sneaky way: abuse lstrip!

fullstr = "\t\tthis line has two tabs of indentation"
startwhites = fullstr[:len(fullstr)-len(fullstr.lstrip())]

This way you don't have to work through all the details of whitespace!

(Thanks Adam for the correction)

查看更多
走好不送
3楼-- · 2020-02-01 04:07
import re
s = "\t\tthis line has two tabs of indention"
re.match(r"\s*", s).group()
// "\t\t"
s = "    this line has four spaces of indention"
re.match(r"\s*", s).group()
// "    "

And to strip leading spaces, use lstrip.


As there are down votes probably questioning the efficiency of regex, I've done some profiling to check the efficiency of each cases.

Very long string, very short leading space

RegEx > Itertools >> lstrip

>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*")s="          hello world!"*10000', number=100000)
0.10037684440612793
>>> timeit.timeit('"".join(itertools.takewhile(lambda x:x.isspace(),s))', 'import itertools;s="          hello world!"*10000', number=100000)
0.7092740535736084
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s="          hello world!"*10000', number=100000)
0.51730513572692871
>>> timeit.timeit('s[:-len(s.lstrip())]', 's="          hello world!"*10000', number=100000)
2.6478431224822998

Very short string, very short leading space

lstrip > RegEx > Itertools

If you can limit the string's length to thousounds of chars or less, the lstrip trick maybe better.

>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s="          hello world!"*100', number=100000)
0.099548101425170898
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s="          hello world!"*100', number=100000)
0.53602385520935059
>>> timeit.timeit('s[:-len(s.lstrip())]', 's="          hello world!"*100', number=100000)
0.064291000366210938

This shows the lstrip trick scales roughly as O(√n) and the RegEx and itertool methods are O(1) if the number of leading spaces is not a lot.

Very short string, very long leading space

lstrip >> RegEx >>> Itertools

If there are a lot of leading spaces, don't use RegEx.

>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*2000', number=10000)
0.047424077987670898
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*2000', number=10000)
0.2433168888092041
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*2000', number=10000)
3.9949162006378174

Very long string, very long leading space

lstrip >>> RegEx >>>>>>>> Itertools

>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*200000', number=10000)
4.2374031543731689
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*200000', number=10000)
23.877214908599854
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*200000', number=100)*100
415.72158336639404

This shows all methods scales roughly as O(m) if the non-space part is not a lot.

查看更多
够拽才男人
4楼-- · 2020-02-01 04:08
def whites(a):
return a[0:a.find(a.strip())]

Basically, the my idea is:

  1. Find a strip of starting line
  2. Find a difference between starting line and stripped one
查看更多
该账号已被封号
5楼-- · 2020-02-01 04:14

This can also be done with str.isspace and itertools.takewhile instead of regex.

import itertools

tests=['\t\tthis line has two tabs of indention',
       '    this line has four spaces of indention']

def indention(astr):
    # Using itertools.takewhile is efficient -- the looping stops immediately after the first
    # non-space character.
    return ''.join(itertools.takewhile(str.isspace,astr))

for test_string in tests:
    print(indention(test_string))
查看更多
萌系小妹纸
6楼-- · 2020-02-01 04:15

How about using the regex \s* which matches any whitespace characters. You only want the whitespace at the beginning of the line so either search with the regex ^\s* or simply match with \s*.

查看更多
淡お忘
7楼-- · 2020-02-01 04:23

If you're interested in using regular expressions you can use that. /\s/ usually matches one whitespace character, so /^\s+/ would match the whitespace starting a line.

查看更多
登录 后发表回答