Table VS xml / json / yaml - table requires less s

2019-09-20 00:42发布

问题:

To add a field to a XML object it takes the length of the fieldname + 3 characters (or 7 when nested) and for JSON 4 (or 6 when nested)

<xml>xml</xml>       xml="xml"    
{"json":json,}       "json": json,       

Assume the average is 4 and fieldname average is 11 - to justify the use of XML/JSON over a table in use of storage, each field must in average only appear in less than 1/15 of objects, in other words there must be ~15 times more different fields within the whole related group of objects, than one object has in average. (Yet a table may very well allows faster computation still when this ration is higher and its bigger in storage) I have not yet seen a use of XML/JSON with a very high ratio.

Aren't most real of XML/JSON forced and inefficient? Shouldn't related data be stored and queried in relations (tables)? What am i missing?

Example conversion XML to table

Object1

<aaaaahlongfieldname>1</aaaaahlongfieldname>
<b>B
  <c>C</c> 
</b>

Object2

<aaaaahlongfieldname>2</aaaaahlongfieldname>    
<b><c><d>D</d></c></b>
<ba>BA</ba>
<ba "xyz~">BA</ba>
<c>C</c> 

Both converted to a csv like table (delimiter declaration,head,line1,line2)

delimiter=,   
aaaaahlongfieldname,b,b/c,b/c/d,ba,ba-xyz~,c
,B,C,,,,
,,,D,BA,BA,C
  • / and - symbols in values will need to be escaped only in the head

  • but ,,,, could also be \4 escaped number of delimiters in a row (when an escape symbol or string is declared as well - worth it at large numbers of empty fields ) and since escape character and delimiter will need to be escaped when they appear in values, they could automatically be declared rare symbols that usually hardly appear

    escape=~   
    delimiter=°  
    aaaaahlongfieldname°b°b/c°b/c/d°ba°ba-xyz~~°c
    °B°C~4
    °°°D°BA°BA°C
    

Validation/additional info: XML/json misses all empty fields so missing "fields in "rows can not be noticed. A line of a table is only valid when the number of fields is correct and (faulty) lines must be noticed. but through columns having different datatypes missing delimiters could usually easily be repaired.

Edit: On readablity/editablity: Good thing of course, the first time one read xml and json it maybe was selfexplanatory having read html and js already but that's all? - most of the time it is machines reading it and sometimes developers, both of which may not be entertained by the verbosity

回答1:

The CSV in your example is quite inefficient use of 8 bit encoding. You're hardly even using 5 bits of entropy, clearly wasting 3 bits. Why not compress it?

The answer to all of these is people make mistakes, and stronger typing trades efficiency for safety. It is impossible for machine or human to identify a transposed column in a CSV stream, however both JSON & XML would automatically handle it, and (assuming no hierarchy boundaries got crossed) everything would still work. 30 years ago when storage space was scarce & instructions per second were sometimes measured 100s per second, using minimal amounts of decoration in protocols made sense. These days even embedded systems have relatively vast amounts of power & storage, thus the tradeoff for a little extra safety is much easier to make.

For tightly controlled data transfer, say between modules that my development team is working on, JSON works great. But when data needs to go between different groups, I strongly prefer XML, simply because it helps both sides understand what is happening. If the data needs to go across a "slow" pipe, compression will remove 98% of the XML "overhead".



回答2:

The designers of XML were well aware that there was a high level of redundancy in the representation, and they considered this a good thing (I'm not saying they were right). Essentially (a) redundancy costs nothing if you use data compression, (b) redundancy (within limits) helps human readability, and (c ) redundancy makes it easier to detect and diagnose errors, especially important when XML is being hand-authored.