Python: Sanitize data in JSON before sending

2020-08-04 04:06发布

问题:

I have a JSON file that needs to be sent. Before sending I need to do a validity check and replace some special characters (spaces and dots(.)).

The problem is that Python inserts u character before each of my strings, which can't be read by the server. How do I remove the u character and do the data sanitation (character replacement)?

Original JSON

{
    "columns": [
        {
            "data": "Doc.",
            "title": "Doc."
        },
        {
            "data": "Order no.",
            "title": "Order no."
        },
        {
            "data": "Nothing",
            "title": "Nothing"
        }
    ],
    "data": [
        {
            "Doc.": "564251422",
            "Nothing": 0.0,
            "Order no.": "56421"
        },
        {
            "Doc.": "546546545",
            "Nothing": 0.0,
            "Order no.": "98745"
        }
    ]
}

Python:

import json
def func():
    with open('json/simpledata.json', 'r') as json_file:
        json_data = json.load(json_file)
        print(json_data)
func()

Output JSON:

{u'data': [{u'Nothing': 0.0, u'Order no.': u'56421', u'Doc.': u'564251422'}, {u'Nothing': 0.0, u'Order no.': u'98745', u'Doc.': u'546546545'}], u'columns': [{u'data': u'Doc.', u'title': u'Doc.'}, {u'data': u'Order no.', u'title': u'Order no.'}, {u'data': u'Nothing', u'title': u'Nothing'}]}

What I'm trying to achieve in Python:

    sanitizeData: function(jsonArray) {
        var newKey;
        jsonArray.forEach(function(item) {
            for (key in item) {
                newKey = key.replace(/\s/g, '').replace(/\./g, '');
                if (key != newKey) {
                    item[newKey] = item[key];
                    delete item[key];
                }
            }
        })
        return jsonArray;
    },
    # remove whitespace and dots from data : <propName> references
    sanitizeColumns: function(jsonArray) {
        var dataProp = [];
        jsonArray.forEach(function(item) {
            dataProp = item['data'].replace(/\s/g, '').replace(/\./g, '');
            item['data'] = dataProp;
        })
        return jsonArray;
    }

回答1:

To properly print the JSON as a string, try print(json.dumps(json_data))

See also https://docs.python.org/2/library/json.html#json.dumps

For removing certain characters from a string you can do the obvious thing:

string = string.replace(".", "").replace(" ", "")

or, more efficiently, use str.translate (the example only works for python 2):

string = string.translate(None, " .")

or with regular expressions; re.sub:

import re
string = re.sub(r"[ .]", "", string)

And then just use a nice comprehension to go over the whole dictionary (use items() with python 3):

sanitize = lambda s: re.sub(r"[ .]", "", s)
table = {sanitize(k):sanitize(v) for k, v in table.iteritems()}

But this only works on a swallow dictionary. It doesn't look like your solution works on a deeply nested structure as well though. But if you need that, how about some recursion (for python 3 use items() instead of iteritems() and str instead of basestring):

def sanitize(value):
    if isinstance(value, dict):
        value = {sanitize(k):sanitize(v) for k, v in value.iteritems()}
    elif isinstance(value, list):
        value = [sanitize(v) for v in value]
    elif isinstance(value, basestring):
        value = re.sub(r"[ .]", "", value)
    return value
table = sanitize(table)


回答2:

I just wanted to add a version to the excellent solution af @Felk. I had a bunch of keys that had dots in them. The solution from @Felk removed the dots from the keys, but also from the values - which I did not want. So for anyone - like me - entering this post for a solution that only sanitites the keys, here it is.

def sanitize(value, is_value=True):
    if isinstance(value, dict):
        value = {sanitize(k,False):sanitize(v,True) for k, v in value.items()}
    elif isinstance(value, list):
        value = [sanitize(v, True) for v in value]
    elif isinstance(value, str):
        if not is_value:
            value = re.sub(r"[.]", "", value)
    return value

table = sanitize(table)


回答3:

example:

 import json

 json_d = json.load(open('data.json', 'r'))
 json_d = json.dumps(json_d)
 print(json_d)