Get str repr with double quotes Python

2019-06-15 03:02发布

问题:

I'm using a small Python script to generate some binary data that will be used in a C header.

This data should be declared as a char[], and it will be nice if it could be encoded as a string (with the pertinent escape sequences when they are not in the range of ASCII printable chars) to keep the header more compact than with a decimal or hexadecimal array encoding.

The problem is that when I print the repr of a Python string, it is delimited by single quotes, and C doesn't like that. The naive solution is to do:

'"%s"'%repr(data)[1:-1]

but that doesn't work when one of the bytes in the data happens to be a double quote, so I'd need them to be escaped too.

I think a simple replace('"', '\\"') could do the job, but maybe there's a better, more pythonic solution out there.

Extra point:

It would be convenient too to split the data in lines of approximately 80 characters, but again the simple approach of splitting the source string in chunks of size 80 won't work, as each non printable character takes 2 or 3 characters in the escape sequence. Splitting the list in chunks of 80 after getting the repr won't help either, as it could divide escape sequence.

Any suggestions?

回答1:

repr() isn't what you want. There's a fundamental problem: repr() can use any representation of the string that can be evaluated as Python to produce the string. That means, in theory, that it might decide to use any number of other constructs which wouldn't be valid in C, such as """long strings""".

This code is probably the right direction. I've used a default of wrapping at 140, which is a sensible value for 2009, but if you really want to wrap your code to 80 columns, just change it.

If unicode=True, it outputs a L"wide" string, which can store Unicode escapes meaningfully. Alternatively, you might want to convert Unicode characters to UTF-8 and output them escaped, depending on the program you're using them in.

def string_to_c(s, max_length = 140, unicode=False):
    ret = []

    # Try to split on whitespace, not in the middle of a word.
    split_at_space_pos = max_length - 10
    if split_at_space_pos < 10:
        split_at_space_pos = None

    position = 0
    if unicode:
        position += 1
        ret.append('L')

    ret.append('"')
    position += 1
    for c in s:
        newline = False
        if c == "\n":
            to_add = "\\\n"
            newline = True
        elif ord(c) < 32 or 0x80 <= ord(c) <= 0xff:
            to_add = "\\x%02x" % ord(c)
        elif ord(c) > 0xff:
            if not unicode:
                raise ValueError, "string contains unicode character but unicode=False"
            to_add = "\\u%04x" % ord(c)
        elif "\\\"".find(c) != -1:
            to_add = "\\%c" % c
        else:
            to_add = c

        ret.append(to_add)
        position += len(to_add)
        if newline:
            position = 0

        if split_at_space_pos is not None and position >= split_at_space_pos and " \t".find(c) != -1:
            ret.append("\\\n")
            position = 0
        elif position >= max_length:
            ret.append("\\\n")
            position = 0

    ret.append('"')

    return "".join(ret)

print string_to_c("testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing testing", max_length = 20)
print string_to_c("Escapes: \"quote\" \\backslash\\ \x00 \x1f testing \x80 \xff")
print string_to_c(u"Unicode: \u1234", unicode=True)
print string_to_c("""New
lines""")


回答2:

If you're asking a python str for its repr, I don't think the type of quote is really configurable. From the PyString_Repr function in the python 2.6.4 source tree:

    /* figure out which quote to use; single is preferred */
    quote = '\'';
    if (smartquotes &&
        memchr(op->ob_sval, '\'', Py_SIZE(op)) &&
        !memchr(op->ob_sval, '"', Py_SIZE(op)))
        quote = '"';

So, I guess use double quotes if there is a single quote in the string, but don't even then if there is a double quote in the string.

I would try something like writing my own class to contain the string data instead of using the built in string to do it. One option would be deriving a class from str and writing your own repr:

class MyString(str):
    __slots__ = []
    def __repr__(self):
        return '"%s"' % self.replace('"', r'\"')

print repr(MyString(r'foo"bar'))

Or, don't use repr at all:

def ready_string(string):
    return '"%s"' % string.replace('"', r'\"')

print ready_string(r'foo"bar')

This simplistic quoting might not do the "right" thing if there's already an escaped quote in the string.



回答3:

Better not hack the repr() but use the right encoding from the beginning. You can get the repr's encoding directly with the encoding string_escape

>>> "naïveté".encode("string_escape")
'na\\xc3\\xafvet\\xc3\\xa9'
>>> print _
na\xc3\xafvet\xc3\xa9

For escaping the "-quotes I think using a simple replace after escape-encoding the string is a completely unambiguous process:

>>> '"%s"' % 'data:\x00\x01 "like this"'.encode("string_escape").replace('"', r'\"')
'"data:\\x00\\x01 \\"like this\\""'
>>> print _
"data:\x00\x01 \"like this\""


回答4:

You could try json.dumps:

>>> import json
>>> print(json.dumps("hello world"))
"hello world"

>>> print(json.dumps('hëllo "world"!'))
"h\u00ebllo \"world\"!"

I don't know for sure whether json strings are compatible with C but at least they have a pretty large common subset and are guaranteed to be compatible with javascript;).