Is it worth the effort to try to reduce JSON size?

2020-02-19 05:44发布

问题:

I am submitting relatively lots of data from a mobile application (up to 1000 JSON objects), that I would normally encode like this:

[{
    id: 12,
    score: 34,
    interval: 5678,
    sub: 9012
}, {
    id: ...
}, ...]

I could make the payload smaller by submitting an array of arrays instead:

[[12, 34, 5678, 9012], [...], ...]

to save some space on the property names, and recreate the objects on the server (as the schema is fixed, or at least it is a contract between the server and the client).

The payload in then submitted in a POST request, most likely over a 3G connection (or could be wifi).

It looks like I am saving some bandwidth by using nested arrays, but I'm not sure it is noticeable when gzip is applied, and I'm not sure how to precisely and objectively measure the difference.

On the other hand, the nested arrays don't feel like a good idea: they are less readable and thus harder to spot errors while debugging. Also, since we're flushing readability down the toilet, we could just flatten the array, since each child array has a fixed number of elements, the server could just slice it up and reconstruct the objects again.

Any further reading material on this topic is much appreciated.

回答1:

JSONH, aka hpack, https://github.com/WebReflection/JSONH does something very similar to your example:

[{
    id: 12,
    score: 34,
    interval: 5678,
    sub: 9012
}, {
    id: 98,
    score: 76,
    interval: 5432,
    sub: 1098
}, ...]

Would turn into:

[["id","score","interval","sub"],12,34,5678,9012,98,76,5432,1098,...]


回答2:

JSON is meant for readability. You could have an intermediate format if you're concerned about space. Create a serialize/deserialize function which takes a JSON file and creates a compressed binary storing your data as compactly as is reasonable, then read that format on the other end of the line.

See: http://en.wikipedia.org/wiki/Json First sentence: "JSON...is a lightweight text-based open standard designed for human-readable data interchange."

Essentially, my point is that humans would always see the JSON, and machines would primarily see the binary. You get the best of both worlds: readability and small data transfer (at the cost of a tiny amount of computation).



回答3:

Gzip will replace the recurring parts of your message with small back-references to their first occurence. The algorithm is pretty "dumb" but for this kind of repetitive data it is great. I think you won't see noticeable decreases in over-the-wire size because your object "structure" is sent only once.

You can roughly test this by zipping two sample JSONs. Or by capturing an HTTP-request using Fiddler. It can show the compressed and uncompressed sizes.



回答4:

Since you're using this on a mobile device (you mention 3G), you might actually want to care about size, not readability. Moreover, do you frequently expect to read what is being transmitted over the wire?

This is a suggestion for an alternate form.

ProtoBuf is one option. Google uses it internally, and there is a ProtoBuf 'compiler' which can read .proto files (containing a message description) and generate Java/C++/Python serializers/deserializers, which use a binary form for transmission over the wire. You simply use the generated classes on both ends, and forget about what the object looks like when transmitted over the wire. There is also an Obj-C port maintained externally.

Here is a comparison of ProtoBuf against XML, on the ProtoBuf website (I know XML is not what you use, still).

Finally, here is a Python tutorial.



回答5:

Although is an old question, I'd like to put some words.

In my experience, large differences in json raw size, amount very little after compression. I prefer to keep it human readable.

In real case numbers: a json file of 1,29MB, and the optimized version of 145KB, when compressed, where of 32KB and 9KB.

Except in extreme conditions, I think this kind of differences are negligibles and the cost in readability is huge.

A:

{
  "Code": "FCEB97B6",
  "Date": "\/Date(1437706800000)\/",
  "TotalQuantity": 1,
  "Items": [
    {
      "CapsulesQuantity": 0,
      "Quantity": 1,
      "CurrentItem": {
        "ItemId": "SHIELD_AXA",
        "Order": 30,
        "GroupId": "G_MODS",
        "TypeId": "T_SHIELDS",
        "Level": 0,
        "Rarity": "R4",
        "UniqueId": null,
        "Name": "AXA Shield"
      }
    }
  ],
  "FormattedDate": "2015-Jul.-24"
}

B:

{
  "fDate": "2016-Mar.-01",
  "totCaps": 9,
  "totIts": 14,
  "rDays": 1,
  "avg": "1,56",
  "cells": {
    "00": {
      "30": 1
    },
    "03": {
      "30": 1
    },
    "08": {
      "25": 1
    },
    "09": {
      "26": 3
    },
    "12": {
      "39": 1
    },
    "14": {
      "33": 1
    },
    "17": {
      "40": 3
    },
    "19": {
      "41": 2
    },
    "20": {
      "41": 1
    }
  }
}

This are fragments of the two files.