how to receive regex from command line in python

2020-03-05 03:33发布

问题:

I want to receive a delimiter like '\t' (tab) from command line, and use it to parse a text file.

If I put

delimiter = sys.argv[1]

in the code, and type from the command line

$ python mycode.py "\t"

delimiter is '\\t' i.e., python does its thing to preserve input string as is.

I want to convert this to '\t' so that I can use e.g.,

'a\tb\tc'.split(delimiter)

to get ['a','b','c'].

I've tried to convert '\' to '\', but failed.

Is there a built-in python function to handle regex from the command line?

回答1:

In Python 2 you can use str.decode('string_escape'):

>>> '\\t'.decode('string_escape')
'\t'

In Python 3 you have to encode the string to bytes first and then use unicode_escape:

>>> '\\t'.encode().decode('unicode_escape')
'\t'

Both solutions accept any escape sequence and will decode them correctly, so you could even use some fancy unicode stuff:

>>> '\\t\\n\\u2665'.encode().decode('unicode_escape')
'\t\n♥'


回答2:

It's not really regexp you're looking for, it's escape sequences.

You could use eval, as long as you're perfectly aware of the terrible security consequences, or roll your own string replacement/regexp based escape sequence unescaper.

(Who knows, maybe arg = arg.replace("\\t", "\t") is enough for you?)

As a workaround you could do

$ python mycode.py `echo -ne '\t'`

to (ab) use the Unix echo command to do the unescaping for you.