单与Python中的JSON负载双引号(Single versus double quotes in

2019-07-18 06:00发布

我注意到,单引号引起simplejsonloads运作失败:

>>> import simplejson as json
>>> json.loads("\"foo\"")
'foo'
>>> json.loads("\'foo\'")
Traceback (most recent call last):
...
ValueError: No JSON object could be decoded

我解析之类的东西: foo = ["a", "b", "c"]从一个文本文件到Python列表,并想同时接受foo = ['a', 'b', 'c']simplejson便于使foo自动到一个列表。

我怎样才能得到loads接受单引号,或者自动为单引号代替双不破坏输入? 谢谢。

Answer 1:

使用正确的工具的工作,你是不是解析JSON但是Python,所以使用ast.literal_eval()代替:

>>> import ast
>>> ast.literal_eval('["a", "b", "c"]')
['a', 'b', 'c']
>>> ast.literal_eval("['a', 'b', 'c']")
['a', 'b', 'c']
>>> ast.literal_eval('["mixed", \'quoting\', """styles"""]')
['mixed', 'quoting', 'styles']
  • JSON文件总是使用字符串双引号,使用UTF-16 \uhhhh十六进制转义语法,有{...}对象的键值对的钥匙总是字符串和序列总是[...]列表,并使用nulltruefalse值; 注意是小写的布尔值。 数以整数和浮点形式。

  • 在Python中,字符串表示可以使用单引号和双引号,Unicode转义使用\uhhhh\Uhhhhhhhh形式(无UTF-16代理对),使用词典{...}显示语法可以在许多不同的类型,而不是只是字符串键,序列可以是列表( [...]但也可以使用元组( (...)或者你可以有其他容器类型依然。 Python有NoneTrueFalse (首字母大写!)和数字进来整数,浮点数和复杂的形式。

令人迷惑的与其他可解码时发生了成功,但数据已经被错误地解释,如与逃脱非BMP代码点这样的表情符号要么导致解析错误或微妙的问题。 请务必使用正确的方法来解码! 在大多数情况下,当你有Python语法数据居然有人使用的编码,仅偶然产生的Python表示的错误的方法。 见如果源需要在这种情况下固定; 通常输出是通过使用产生的str(object) ,其中json.dumps(obj)应该被代替。



Answer 2:

我在这个问题StackOverflow上开始寻找这种解决自己的时候。

该解决方案使用ast.literal_eval()为我所有的情况下,没有工作,作为文本也偶尔有布尔常量true / false,这并不被认为是Python的令牌(这是大写)。

为了解决这个问题我自己,我工作坚持一个自定义的JSONDecoder它插入标准的JSON Python包。

pip install git+https://github.com/jpz/tolerantjsondecoder.git

也许这可能是使用下一个人的。

此外,要注意的,我完成了这个之后,我后来发现demjson这似乎是一个更完整的解决方案库,但我还没有评估它。



Answer 3:

Demjson做你的工作,但它是非常缓慢的,我的意思是非常缓慢的比较simplejson。 我不推荐它的生产环境。

ast.literal_eval()和YAML也因此你需要像simplejson更稳定的解决方案不能在所有JSON工作。

如果你调整simplejson一点点的话,就可以做你的工作。 我自己已经做到了和共享此代码。

我告诉它在2点

1)从GitHub下载simplejson并把它添加到您的项目。 2)现在simplejson具有decoder.py Python文件。 替换该文件的代码,此代码

"""Implementation of JSONDecoder
"""
from __future__ import absolute_import
import re
import sys
import struct
from .compat import u, text_type, binary_type, PY3, unichr
from .scanner import make_scanner, JSONDecodeError

def _import_c_scanstring():
    try:
        from ._speedups import scanstring
        return scanstring
    except ImportError:
        return None
c_scanstring = _import_c_scanstring()

# NOTE (3.1.0): JSONDecodeError may still be imported from this module for
# compatibility, but it was never in the __all__
__all__ = ['JSONDecoder']

FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL

def _floatconstants():
    if sys.version_info < (2, 6):
        _BYTES = '7FF80000000000007FF0000000000000'.decode('hex')
        nan, inf = struct.unpack('>dd', _BYTES)
    else:
        nan = float('nan')
        inf = float('inf')
    return nan, inf, -inf

NaN, PosInf, NegInf = _floatconstants()

_CONSTANTS = {
    '-Infinity': NegInf,
    'Infinity': PosInf,
    'NaN': NaN,
}

STRINGCHUNK = re.compile(r'(.*?)(["\\\x00-\x1f])', FLAGS)

# Changed Code Here. Added These Two Lines

STRINGCHUNKUNQUOTED = re.compile(r'(.*?)([:\\\x00-\x1f])', FLAGS)
STRINGCHUNKSINGLEQUOTED = re.compile(r"(.*?)(['\\\x00-\x1f])", FLAGS)

BACKSLASH = {
    '"': u('"'), '\\': u('\u005c'), '/': u('/'),
    'b': u('\b'), 'f': u('\f'), 'n': u('\n'), 'r': u('\r'), 't': u('\t'),
}
# Changed Code Here.
SINGLE_QUOTE_BACKSLASH = {
    "'": u("'"), '\\': u('\u005c'), '/': u('/'),
    'b': u('\b'), 'f': u('\f'), 'n': u('\n'), 'r': u('\r'), 't': u('\t'),
}

DEFAULT_ENCODING = "utf-8"

# Changed Code Here.
def parse_single_quoted_string(s, end, encoding=None, strict=True):
    return py_scanstring(s, end, encoding, strict, SINGLE_QUOTE_BACKSLASH, STRINGCHUNKSINGLEQUOTED.match)


def py_scanstring(s, end, encoding=None, strict=True,
        _b=BACKSLASH, _m=STRINGCHUNK.match, _join=u('').join,
        _PY3=PY3, _maxunicode=sys.maxunicode):
    """Scan the string s for a JSON string. End is the index of the
    character in s after the quote that started the JSON string.
    Unescapes all valid JSON string escape sequences and raises ValueError
    on attempt to decode an invalid string. If strict is False then literal
    control characters are allowed in the string.

    Returns a tuple of the decoded string and the index of the character in s
    after the end quote."""
    if encoding is None:
        encoding = DEFAULT_ENCODING
    chunks = []
    _append = chunks.append
    begin = end - 1
    while 1:
        chunk = _m(s, end)
        if chunk is None:
            raise JSONDecodeError(
                "Unterminated string starting at", s, begin)
        end = chunk.end()
        content, terminator = chunk.groups()
        # Content is contains zero or more unescaped string characters
        if content:
            if not _PY3 and not isinstance(content, text_type):
                content = text_type(content, encoding)
            _append(content)
        # Terminator is the end of string, a literal control character,
        # or a backslash denoting that an escape sequence follows

        # Changed Code Here.

        if not is_not_quote(terminator):
            break
        elif terminator != '\\':
            if strict:
                msg = "Invalid control character %r at"
                raise JSONDecodeError(msg, s, end)
            else:
                _append(terminator)
                continue
        try:
            esc = s[end]
        except IndexError:
            raise JSONDecodeError(
                "Unterminated string starting at", s, begin)
        # If not a unicode escape sequence, must be in the lookup table
        if esc != 'u':
            try:
                char = _b[esc]
            except KeyError:
                msg = "Invalid \\X escape sequence %r"
                raise JSONDecodeError(msg, s, end)
            end += 1
        else:
            # Unicode escape sequence
            msg = "Invalid \\uXXXX escape sequence"
            esc = s[end + 1:end + 5]
            escX = esc[1:2]
            if len(esc) != 4 or escX == 'x' or escX == 'X':
                raise JSONDecodeError(msg, s, end - 1)
            try:
                uni = int(esc, 16)
            except ValueError:
                raise JSONDecodeError(msg, s, end - 1)
            end += 5
            # Check for surrogate pair on UCS-4 systems
            # Note that this will join high/low surrogate pairs
            # but will also pass unpaired surrogates through
            if (_maxunicode > 65535 and
                uni & 0xfc00 == 0xd800 and
                s[end:end + 2] == '\\u'):
                esc2 = s[end + 2:end + 6]
                escX = esc2[1:2]
                if len(esc2) == 4 and not (escX == 'x' or escX == 'X'):
                    try:
                        uni2 = int(esc2, 16)
                    except ValueError:
                        raise JSONDecodeError(msg, s, end)
                    if uni2 & 0xfc00 == 0xdc00:
                        uni = 0x10000 + (((uni - 0xd800) << 10) |
                                         (uni2 - 0xdc00))
                        end += 6
            char = unichr(uni)
        # Append the unescaped character
        _append(char)
    return _join(chunks), end


# Use speedup if available
scanstring = c_scanstring or py_scanstring

WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
WHITESPACE_STR = ' \t\n\r'

# Changed Code Here.
UNQUOTEDDICT = {'/': '/', '\\': '\\', ';': ';', '#': '#',
            '=': '=', '{': '{', '}': '}', '[': '[', ']': ']',
            ':': ':', ',': ',', ' ': ' ', '\t': '\t',
            '\f': '\f', '\r': '\r', '\n': '\n'}

# Changed Code Here.
QUOTE_DICT = {'"': '"', "'": "'"}

# Changed Code Here.
def is_literal(char):
    return not UNQUOTEDDICT.get(char, None)

# Changed Code Here.
def is_not_quote(char):
    return not QUOTE_DICT.get(char, None)


# Changed Code Here.
def nexUnquotedKey(s, end):
    chunk = STRINGCHUNKUNQUOTED.match(s,end)
    for i in range(chunk.end()):
        index = i+end
        if not is_literal(s[index]):
            return s[end:index], index




def JSONObject(state, encoding, strict, scan_once, object_hook,
        object_pairs_hook, memo=None,
        _w=WHITESPACE.match, _ws=WHITESPACE_STR):
    (s, end) = state
    # Backwards compatibility
    if memo is None:
        memo = {}
    memo_get = memo.setdefault
    pairs = []
    # Use a slice to prevent IndexError from being raised, the following
    # check will raise a more specific ValueError if the string is empty
    nextchar = s[end:end + 1]
    # Normally we expect nextchar == '"'

    literal_check = False

    # Changed Code Here.
    if is_not_quote(nextchar):
        if nextchar in _ws:
            end = _w(s, end).end()
            nextchar = s[end:end + 1]
        # Trivial empty object
        literal_check = is_literal(nextchar)
        if nextchar == '}':
            if object_pairs_hook is not None:
                result = object_pairs_hook(pairs)
                return result, end + 1
            pairs = {}
            if object_hook is not None:
                pairs = object_hook(pairs)
            return pairs, end + 1
        elif nextchar != '"' and not literal_check:  # Changed Code Here.
            raise JSONDecodeError(
                "Expecting property name enclosed in double quotes",
                s, end)

    # Changed Code Here.
    if not literal_check:
        end += 1

    while True:
        if literal_check:
            key, end = nexUnquotedKey(s,end)
        else:
            # Changed Code Here.
            if nextchar == "'":
                key, end = scanstring(s, end, encoding, strict, SINGLE_QUOTE_BACKSLASH, STRINGCHUNKSINGLEQUOTED.match)
            else:
                key, end = scanstring(s, end, encoding, strict)

        key = memo_get(key, key)

        # To skip some function call overhead we optimize the fast paths where
        # the JSON key separator is ": " or just ":".
        if s[end:end + 1] != ':':
            end = _w(s, end).end()
            if s[end:end + 1] != ':':
                raise JSONDecodeError("Expecting ':' delimiter", s, end)

        end += 1

        try:
            if s[end] in _ws:
                end += 1
                if s[end] in _ws:
                    end = _w(s, end + 1).end()
        except IndexError:
            pass

        value, end = scan_once(s, end)
        pairs.append((key, value))

        try:
            nextchar = s[end]
            if nextchar in _ws:
                end = _w(s, end + 1).end()
                nextchar = s[end]
        except IndexError:
            nextchar = ''
        end += 1

        if nextchar == '}':
            break
        elif nextchar != ',':
            raise JSONDecodeError("Expecting ',' delimiter or '}'", s, end - 1)

        try:
            nextchar = s[end]
            if nextchar in _ws:
                end += 1
                nextchar = s[end]
                if nextchar in _ws:
                    end = _w(s, end + 1).end()
                    nextchar = s[end]
        except IndexError:
            nextchar = ''

        # Changed Code Here.
        if not literal_check:
            end += 1
            # Changed Code Here.
            if is_not_quote(nextchar):
                raise JSONDecodeError(
                    "Expecting property name enclosed in double quotes",
                    s, end - 1)

    if object_pairs_hook is not None:
        result = object_pairs_hook(pairs)
        return result, end
    pairs = dict(pairs)
    if object_hook is not None:
        pairs = object_hook(pairs)
    return pairs, end

def JSONArray(state, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
    (s, end) = state
    values = []
    nextchar = s[end:end + 1]
    if nextchar in _ws:
        end = _w(s, end + 1).end()
        nextchar = s[end:end + 1]
    # Look-ahead for trivial empty array
    if nextchar == ']':
        return values, end + 1
    elif nextchar == '':
        raise JSONDecodeError("Expecting value or ']'", s, end)
    _append = values.append
    while True:
        value, end = scan_once(s, end)
        _append(value)
        nextchar = s[end:end + 1]
        if nextchar in _ws:
            end = _w(s, end + 1).end()
            nextchar = s[end:end + 1]
        end += 1
        if nextchar == ']':
            break
        elif nextchar != ',':
            raise JSONDecodeError("Expecting ',' delimiter or ']'", s, end - 1)

        try:
            if s[end] in _ws:
                end += 1
                if s[end] in _ws:
                    end = _w(s, end + 1).end()
        except IndexError:
            pass

    return values, end

class JSONDecoder(object):
    """Simple JSON <http://json.org> decoder

    Performs the following translations in decoding by default:

    +---------------+-------------------+
    | JSON          | Python            |
    +===============+===================+
    | object        | dict              |
    +---------------+-------------------+
    | array         | list              |
    +---------------+-------------------+
    | string        | str, unicode      |
    +---------------+-------------------+
    | number (int)  | int, long         |
    +---------------+-------------------+
    | number (real) | float             |
    +---------------+-------------------+
    | true          | True              |
    +---------------+-------------------+
    | false         | False             |
    +---------------+-------------------+
    | null          | None              |
    +---------------+-------------------+

    It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as
    their corresponding ``float`` values, which is outside the JSON spec.

    """

    def __init__(self, encoding=None, object_hook=None, parse_float=None,
            parse_int=None, parse_constant=None, strict=True,
            object_pairs_hook=None):
        """
        *encoding* determines the encoding used to interpret any
        :class:`str` objects decoded by this instance (``'utf-8'`` by
        default).  It has no effect when decoding :class:`unicode` objects.

        Note that currently only encodings that are a superset of ASCII work,
        strings of other encodings should be passed in as :class:`unicode`.

        *object_hook*, if specified, will be called with the result of every
        JSON object decoded and its return value will be used in place of the
        given :class:`dict`.  This can be used to provide custom
        deserializations (e.g. to support JSON-RPC class hinting).

        *object_pairs_hook* is an optional function that will be called with
        the result of any object literal decode with an ordered list of pairs.
        The return value of *object_pairs_hook* will be used instead of the
        :class:`dict`.  This feature can be used to implement custom decoders
        that rely on the order that the key and value pairs are decoded (for
        example, :func:`collections.OrderedDict` will remember the order of
        insertion). If *object_hook* is also defined, the *object_pairs_hook*
        takes priority.

        *parse_float*, if specified, will be called with the string of every
        JSON float to be decoded.  By default, this is equivalent to
        ``float(num_str)``. This can be used to use another datatype or parser
        for JSON floats (e.g. :class:`decimal.Decimal`).

        *parse_int*, if specified, will be called with the string of every
        JSON int to be decoded.  By default, this is equivalent to
        ``int(num_str)``.  This can be used to use another datatype or parser
        for JSON integers (e.g. :class:`float`).

        *parse_constant*, if specified, will be called with one of the
        following strings: ``'-Infinity'``, ``'Infinity'``, ``'NaN'``.  This
        can be used to raise an exception if invalid JSON numbers are
        encountered.

        *strict* controls the parser's behavior when it encounters an
        invalid control character in a string. The default setting of
        ``True`` means that unescaped control characters are parse errors, if
        ``False`` then control characters will be allowed in strings.

        """
        if encoding is None:
            encoding = DEFAULT_ENCODING
        self.encoding = encoding
        self.object_hook = object_hook
        self.object_pairs_hook = object_pairs_hook
        self.parse_float = parse_float or float
        self.parse_int = parse_int or int
        self.parse_constant = parse_constant or _CONSTANTS.__getitem__
        self.strict = strict
        self.parse_object = JSONObject
        self.parse_array = JSONArray
        self.parse_string = scanstring
        self.parse_single_quoted_string = parse_single_quoted_string  # Changed Code Here.
        self.memo = {}
        self.scan_once = make_scanner(self)

    def decode(self, s, _w=WHITESPACE.match, _PY3=PY3):
        """Return the Python representation of ``s`` (a ``str`` or ``unicode``
        instance containing a JSON document)

        """
        if _PY3 and isinstance(s, binary_type):
            s = s.decode(self.encoding)
        obj, end = self.raw_decode(s)
        end = _w(s, end).end()
        if end != len(s):
            raise JSONDecodeError("Extra data", s, end, len(s))
        return obj

    def raw_decode(self, s, idx=0, _w=WHITESPACE.match, _PY3=PY3):
        """Decode a JSON document from ``s`` (a ``str`` or ``unicode``
        beginning with a JSON document) and return a 2-tuple of the Python
        representation and the index in ``s`` where the document ended.
        Optionally, ``idx`` can be used to specify an offset in ``s`` where
        the JSON document begins.

        This can be used to decode a JSON document from a string that may
        have extraneous data at the end.

        """
        if idx < 0:
            # Ensure that raw_decode bails on negative indexes, the regex
            # would otherwise mask this behavior. #98
            raise JSONDecodeError('Expecting value', s, idx)
        if _PY3 and not isinstance(s, text_type):
            raise TypeError("Input string must be text, not bytes")
        # strip UTF-8 bom
        if len(s) > idx:
            ord0 = ord(s[idx])
            if ord0 == 0xfeff:
                idx += 1
            elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
                idx += 3
        return self.scan_once(s, idx=_w(s, idx).end())

2)simplejson具有scanner.py蟒文件。 替换该文件的代码,此代码

 """JSON token scanner
"""
import re
from .errors import JSONDecodeError
def _import_c_make_scanner():
    try:
        from ._speedups import make_scanner
        return make_scanner
    except ImportError:
        return None
c_make_scanner = _import_c_make_scanner()

__all__ = ['make_scanner', 'JSONDecodeError']

NUMBER_RE = re.compile(
    r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
    (re.VERBOSE | re.MULTILINE | re.DOTALL))


def py_make_scanner(context):
    parse_object = context.parse_object
    parse_array = context.parse_array
    parse_string = context.parse_string
    parse_single_quoted_string = context.parse_single_quoted_string  # Changed Code Here.
    match_number = NUMBER_RE.match
    encoding = context.encoding
    strict = context.strict
    parse_float = context.parse_float
    parse_int = context.parse_int
    parse_constant = context.parse_constant
    object_hook = context.object_hook
    object_pairs_hook = context.object_pairs_hook
    memo = context.memo

    def _scan_once(string, idx):
        errmsg = 'Expecting value'
        try:
            nextchar = string[idx]
        except IndexError:
            raise JSONDecodeError(errmsg, string, idx)

        if nextchar == '"':
            return parse_string(string, idx + 1, encoding, strict)
        elif nextchar == "'":
            return parse_single_quoted_string(string, idx + 1, encoding, strict)  # Changed Code Here.
        elif nextchar == '{':
            return parse_object((string, idx + 1), encoding, strict,
                _scan_once, object_hook, object_pairs_hook, memo)
        elif nextchar == '[':
            return parse_array((string, idx + 1), _scan_once)
        elif nextchar == 'n' and string[idx:idx + 4] == 'null':
            return None, idx + 4
        elif nextchar == 't' and string[idx:idx + 4] == 'true':
            return True, idx + 4
        elif nextchar == 'f' and string[idx:idx + 5] == 'false':
            return False, idx + 5

        m = match_number(string, idx)
        if m is not None:
            integer, frac, exp = m.groups()
            if frac or exp:
                res = parse_float(integer + (frac or '') + (exp or ''))
            else:
                res = parse_int(integer)
            return res, m.end()
        elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
            return parse_constant('NaN'), idx + 3
        elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
            return parse_constant('Infinity'), idx + 8
        elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
            return parse_constant('-Infinity'), idx + 9
        else:
            raise JSONDecodeError(errmsg, string, idx)

    def scan_once(string, idx):
        if idx < 0:
            # Ensure the same behavior as the C speedup, otherwise
            # this would work for *some* negative string indices due
            # to the behavior of __getitem__ for strings. #98
            raise JSONDecodeError('Expecting value', string, idx)
        try:
            return _scan_once(string, idx)
        finally:
            memo.clear()

    return scan_once

make_scanner = c_make_scanner or py_make_scanner

现在,你都设置。 Simplejson是我用最快的,稳定的图书馆。

我已经改变了在simplejson几行代码,现在这个真棒库工程

  • 非上市JSON键和单引号JSON字符串和键

我只是改变了Python代码。 所以,如果你使用的C扩展为加快提升那么这段代码将无法正常工作。

无论我所做的更改我已经添加评论#改为代码这里

我以前错误的回答来宾用户,现在我无法登录,所以我张贴在新的线程来编辑答案。



文章来源: Single versus double quotes in json loads in Python