可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm currently developing a web application and using JSON for ajax requests and responses. I have an area where I return a very large dataset to the client in the form of an array of over 10000 objects. Here's part of the example (its been simplified somewhat):
"schedules": [
{
"codePractice": 35,
"codeScheduleObject": 576,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 12,
"name": "Dr. 1"
},
{
"codePractice": 35,
"codeScheduleObject": 169,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 43,
"name": "Dr. 2"
},
{
"codePractice": 35,
"codeScheduleObject": 959,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 76,
"name": "Dr. 3"
}
]
As, you can imagine, with a very large number of objects in this array, the size of the JSON reponse can be quite large.
My question is, is there a JSON stringifier/parser that would convert the "schedules"
array to look something like this as a JSON string:
"schedules": [
["codePractice", "codeScheduleObject", "codeLogin", "codeScheduleObjectType", "defaultCodeScheduleObject","name"],
[35, 576, "", 12, "Dr. 1"],
[35, 169, "", 43, "Dr. 2"],
[35, 959, "", 76, "Dr. 3"],
]
ie, that there would be an array at the beginning of the "schedules"
array that held the keys of the objects this array, and all of the other container arrays would hold the values.
I could, if I wanted, do the conversion on the server and parse it on the client, but I'm wondering if there's a standard library for parsing/stringifying large JSON?
I could also run it through a minifier, but I'd like to keep the keys I have currently as they give some context within the application.
I'm also hoping you might critique my approach here or suggest alternatives?
回答1:
HTTP compression (i.e. gzip or deflate) already does exactly that. Repeated patterns, like your JSON keys, are replaced with tokens so that the verbose pattern only has to occur once per transmission.
回答2:
Not an answer, but to give a rough estimate of "savings" based on 10k entries and some bogus data :-) This is in response to a comment I posted. Will the added complexity make the schema'ized approach worth it?
"It depends."
This C# is LINQPad and is ready-to-go for testing/modifying:
string LongTemplate (int n1, int n2, int n3, string name) {
return string.Format(@"
{{
""codePractice"": {0},
""codeScheduleObject"": {1},
""codeScheduleObjectType"": """",
""defaultCodeScheduleObject"": {2},
""name"": ""Dr. {3}""
}}," + "\n", n1, n2, n3, name);
}
string ShortTemplate (int n1, int n2, int n3, string name) {
return string.Format("[{0}, {1}, \"\", {2}, \"Dr. {3}\"],\n",
n1, n2, n3, name);
}
string MinTemplate (int n1, int n2, int n3, string name) {
return string.Format("[{0},{1},\"\",{2},\"Dr. {3}\"],",
n1, n2, n3, name);
}
long GZippedSize (string s) {
var ms = new MemoryStream();
using (var gzip = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress, true))
using (var sw = new StreamWriter(gzip)) {
sw.Write(s);
}
return ms.Position;
}
void Main()
{
var r = new Random();
var l = new StringBuilder();
var s = new StringBuilder();
var m = new StringBuilder();
for (int i = 0; i < 10000; i++) {
var n1 = r.Next(10000);
var n2 = r.Next(10000);
var n3 = r.Next(10000);
var name = "bogus" + r.Next(50);
l.Append(LongTemplate(n1, n2, n3, name));
s.Append(ShortTemplate(n1, n2, n3, name));
m.Append(MinTemplate(n1, n2, n3, name));
}
var lc = GZippedSize(l.ToString());
var sc = GZippedSize(s.ToString());
var mc = GZippedSize(s.ToString());
Console.WriteLine(string.Format("Long:\tNormal={0}\tGZip={1}\tCompressed={2:P}", l.Length, lc, (float)lc / l.Length));
Console.WriteLine(string.Format("Short:\tNormal={0}\tGZip={1}\tCompressed={2:P}", s.Length, sc, (float)sc / s.Length));
Console.WriteLine(string.Format("Min:\tNormal={0}\tGZip={1}\tCompressed={2:P}", m.Length, mc, (float)mc / m.Length));
Console.WriteLine(string.Format("Short/Long\tRegular={0:P}\tGZip={1:P}",
(float)s.Length / l.Length, (float)sc / lc));
Console.WriteLine(string.Format("Min/Long\tRegular={0:P}\tGZip={1:P}",
(float)m.Length / l.Length, (float)mc / lc));
}
My results:
Long: Normal=1754614 GZip=197053 Compressed=11.23 %
Short: Normal=384614 GZip=128252 Compressed=33.35 %
Min: Normal=334614 GZip=128252 Compressed=38.33 %
Short/Long Regular=21.92 % GZip=65.09 %
Min/Long Regular=19.07 % GZip=65.09 %
Conclusion:
- The single biggest savings is to use GZIP (better than just using schema'ize).
- GZIP + schema'ized will be the smallest overall.
- With GZIP there is no point to use a normal JavaScript minimizer (in this scenario).
- Use GZIP (e.g. DEFLATE); it performs very well on repetitive structured text (900% compression on normal!).
Happy coding.
回答3:
Here's an article that does pretty much what you're looking to do:
http://stevehanov.ca/blog/index.php?id=104
At first glance, it looks like your example would be compressed down to the following after the first step of the algorithm, which will actually do more work on it in subsequent steps):
{
"templates": [
["codePractice", "codeScheduleObject", "codeScheduleObjectType", "defaultCodeScheduleObject", "name"]
],
"values": [
{ "type": 1, "values": [ 35, 576, "", 12, "Dr. 1" ] },
{ "type": 1, "values": [ 35, 169, "", 43, "Dr. 2" ] },
{ "type": 1, "values": [ 35, 959, "", 76, "Dr. 3" ] }
]
}
You can start to see the benefit of the algorithm already. Here's the final output after running it through the compressor:
{
"f" : "cjson",
"t" : [
[0,"schedules"],
[0,"codePractice","codeScheduleObject","codeScheduleObjectType","defaultCodeScheduleObject","name"]
],
"v" : {
"" : [ 1, [
{ "" : [2, 35, 576, "", 12, "Dr. 1"] },
{ "" : [2, 35, 169, "", 43, "Dr. 2"] },
{ "" : [2, 35, 959, "", 76, "Dr. 3"] }
]
]
}
}
One can obviously see the improvement if you have several thousands of records. The output is still readable, but I think the other guys are right too: a good compression algorithm is going to remove the blocks of text that are repeated anyway...
回答4:
Before you change your JSON schema give this a shot
http://httpd.apache.org/docs/2.0/mod/mod_deflate.html
回答5:
For the record, i am doing exactly it in php. Its a list of objects from a database.
$comp=base64_encode(gzcompress(json_encode($json)));
json: string(22501 length)
gz compressed = string(711) but its a binary format.
gz compressed + base64 = string(948) its a text format.
So, its considerably smaller by using a fraction of second.