When using SQL
or MySQL
(or any relational DB for that matter) - I understand that saving the data in regular columns is better for indexing sake and other purposes...
The thing is loading and saving JSON
data is sometimes a lot more simple. and makes the development easier.
Are there any "golden rules" for saving raw JSON
data in the DB?
is it absolutely wrong practice to do so?
SUMMARY
Very nice answers were given, but no doubt the most well organized is the answer given by @Shnugo which deserves the bounty.
Would also like to point out answers given by @Gordon Linoff and @Amresh Pandey for explaining other special use cases.
Thank god, and good job everyone!
New SQL Server provides functions for processing JSON text. Information formatted as JSON can be stored as text in standard SQL Server columns and SQL Server provides functions that can retrieve values from these JSON objects.
This simple structure is similar to the standard NoSQL collection that you can create in NoSQL databases (e.g. Azure DocumentDB or MongoDB) where you just have key that represents ID and value that represents JSON.
Note that NVARCHAR is not just a plain text. SQL Server has built-in text compressions mechanism that can transparently compress data stored on disk. Compression depends on language and can go up to 50% depending on your data (see UNICODE compression ).
The key difference between SQL server and other plain NoSQL databases is that SQL Server enables you to use hybrid data model where you can store several JSON objects in the same “collection” and combine them with regular relational columns.
As an example, imagine that we know that every person in your collection will have FirstName and LastName, and that you can store general information about the person as one JSON object, and phone numbers/email addresses as separate objects. In SQL Server 2016 we can easily create this structure without any additional syntax:
Instead of single JSON object you can organize your data in this “collection”. If you do not want to explicitly check structure of each JSON column, you don’t need to add JSON check constraint on every column (in this example I have added CHECK constraint only on EmailAddresses column).
If you compare this structure to the standard NoSQL collection, you might notice that you will have faster access to strongly typed data (FirstName and LastName). Therefore, this solution is good choice for hybrid models where you can identify some information that are repeated across all objects, and other variable information can be stored as JSON. This way, you can combine flexibility and performance.
If you compare this structure with the schema of Person table AdventureWorks database, you might notice that we have removed many related tables.
Beside simplicity of schema, your data access operations will be simpler compared to complex relational structure. Now you can read single table instead of joining several tables. When you need to insert new person with related information (email addresses, phone numbers) you can insert a single record in one table instead of inserting one record in AdventureWorks Person table, taking identity column to find foreign key that will be used to store phones, email addresses, etc. In addition, in this model you can easily delete single person row without cascade deletes using foreign key relationships.
NoSQL databases are optimized for simple, read, insert, and delete operations – SQL Server 2016 enables you to apply the same logic in relational database.
JSON constraints In the previous examples, we have seen how to add simple constraint that validates that text stored in the column is properly formatted. Although JSON do not have strong schema, you can also add complex constraints by combining functions that read values from JSON and standard T-SQL functions:
Note that CHECK constraints might slow down your insert/update processes so you might avoid them if you need faster write performance.
Compressed JSON storage If you have large JSON text you can explicitly compress JSON text using built-in COMPRESS function. In the following example compressed JSON content is stored as binary data, and we have computed column that decompress JSON as original text using DECOMPRESS function:
COMPRESS and DECOMPRESS functions use standard GZip compression. If your client can handle GZip compression (e.g browser that understands gzip content), you can directly return compressed content. Note that this is performance/storage trade-off. If you frequently query compressed data you mig have slower performance because text must be decompressed each time.
Note: JSON functions are available only in SQL Server 2016+ and Azure SQL Database.
More can be read from the source of this article
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/23/storing-json-in-sql-server/
Json's are not great in relationional db's. If you unfold the json into columns and store in a db , it's great but storing a json as a blob is next to using it as data archival system.
There could be several reasons for not unfolding a json and storing it in a single column but the decision would have been taken as the values in that json field would not be used for any querying (or the values have been already unfolded into columns).
Also , most of the json processing if at all the field was queried would be outside the sql environment as sql is just not meant for json processing. The real question then becomes , where do i store this json, do i just let it be as flat files and when required query them via some other system (spark/hive/etc).
I would agree with your DB artist , don't use RDBMS for archival. There are cheaper options. Also json blobs can get huge and can start bogging down the DB disk space with time.
The "golden rule" I use, in a hand-wavey sort of way, is that if I need JSON in its raw format, it's okay to store. If I have to make a special point of parsing it, then it's not.
For instance, if I'm creating an API that sends out raw JSON, and for whatever reason this value isn't going to change, then it's okay to store it as raw JSON. If I have to parse it, change it, update it, etc... then not so much.
The main questions are
JSON (like XML) is great for data exchange, small storage and generically defined structures, but it cannot participate in typical actions you run within your RDBMS. In most cases it will be better to transfer your JSON data into normal tables and re-create the JSON when you need it.
XML / JSON and 1.NF
The first rule of normalisation dictates, never to store more than one bit of information into one column. You see a column "PersonName" with a value like "Mickey Mouse"? You point to this and cry: Change that immediately!
What about XML or JSON? Are these types breaking 1.NF? Well, yes and no...
It is perfectly okay to store a complete structure as one bit of information if it is one bit of information actually. You get a SOAP response and want to store it because you might need this for future reference (but you will not use this data for your own processes)? Just store it as is!
Now imagine a complex structure (XML or JSON) representing a person (with its address, further details...). Now you put this into one column as
PersonInCharge
. Is this wrong? Shouldn't this rather live in properly designed related tables with a foreign key reference instead of the XML/JSON? Especially if the same person might occur in many different rows it is definitely wrong to use an XML/JSON approach.But now imagine the need to store historical data. You want to persist the person's data for a given moment in time. Some days later the person tells you a new address? No problem! The old address lives in an XML/JSON if you ever need it...
Conclusion: If you store the data just to keep it, it's okay. If this data is a unique portion, it's okay...
But if you need the internal parts regularly or if this would mean redundant duplicate storage it's not okay...
Physical storage
The following is for SQL Server and might be different on other RDBMs.
XML is not stored as the text you see, but as a hierarchy tree. Querying this is astonishingly well performing! This structure is not parsed on string level!
JSON in SQL Server (2016+) lives in a string and must be parsed. There is no real native JSON type (like there is a native XML type). This might come later, but for now I'd assume, that JSON will not be as performant as XML on SQL Server (see section UPDATE 2). Any need to read a value out of JSON will need a hell of lot of hidden string method calls...
What does this mean for you?
your lovable DB artist :-D knows, that storing JSON as is, is against common principles of RDBMs. He knows,
There are some workarounds (depending on the RDBMS you are using), but most of them don't work the way you'd like it...
The answer to your question in short
YES
You can store this just as any other exists only content. We are storing many pictures as BLOBs, but we would not try to filter for all images with a flower...
NO
You might start with the JSON within a string column or as BLOB and change this to physical tables when you need it. My magic crystal ball tells me, this might be tomorrow :-D
UPDATE
Find some ideas about performance and disc space here: https://stackoverflow.com/a/47408528/5089204
UPDATE 2: More about performance...
The following addresses JSON and XML support in SQL-Server 2016
User @mike123 pointed to an article on an official microsoft blog which seems to proof in an experiment, that querying a JSON is 10 x faster then querying an XML in SQL-Server.
Some thoughts about that:
Some cross-checks with the "experiment":
XQuery
support! Find a product with a given ID within an array? JSON needs to read the whole lot and use a filter afterwards usingWHERE
, whileXML
would allow an internalXQuery predicate
. Not to speak aboutFLWOR
.../text()
to theXPath
reduces this to less than 2x. In the related article user "Mister Magoo" pointed this out already, but the click-bait title is still unchanged...SUBSTRING
andCHARINDEX
:-DThe following code will show a more realistic experiment
Product
(a JSON array vs. sibling nodes)GO 10
will run through this block ten times to avoid first-call-biasThe final result shows clearly, that JSON is slower than XML (not that much, about 1.5x on a still very simple example).
The final statement:
The test code
The result (SQL Server 2016 Express on an Acer Aspire v17 Nitro Intel i7, 8GB Ram)
This is too long for a comment.
If it were "absolutely wrong", then most databases would not support it. Okay, most databases support commas in the
FROM
clause and I view that as "absolutely wrong". But support for JSON is new development, not a backward-compatible "feature".One obvious case is when the JSON struct is simply a BLOB that is passed back to the application. Then there is no debate -- other then the overhead of storing JSON, which is unnecessarily verbose for structured data with common fields in every record.
Another case is the "sparse" columns case. You have rows with many possible columns, but these vary from row to row.
Another case is when you want to store "nested" records in a record. JSON is powerful.
If the JSON has common fields across records that you want to query on, then you are usually better off putting these in proper database columns. However, data is complicated and there is a place for formats such as JSON.
I'll wave my magic wand. Poof! Golden Rules on use of JSON:
If MySQL does not need to look inside the JSON, and the application simply needs a collection of stuff, then JSON is fine, possibly even better.
If you will be searching on data that is inside and you have MariaDB 10.0.1 or MySQL 5.7 (with a JSON datatype and functions), then JSON might be practical. MariaDB 5.3's "Dynamic" columns is a variant on this.
If you are doing "Entity-Attribute-Value" stuff, then JSON is not good, but it is the least of several evils. http://mysql.rjweb.org/doc.php/eav
For searching by an indexed column, not having the value buried inside JSON is a big plus.
For searching by a range on an indexed column, or a
FULLTEXT
search orSPATIAL
, JSON is not possible.For
WHERE a=1 AND b=2
the "composite" indexINDEX(a,b)
is great; probably can't come close with JSON.JSON works well with "sparse" data; INDEXing works, but not as well, with such. (I am referring to values that are 'missing' or NULL for many of the rows.)
JSON can give you "arrays" and "trees" without resorting to extra table(s). But dig into such arrays/trees only in the app, not in SQL.
JSON is worlds better than XML. (My opinion)
If you do not want to get into the JSON string except from the app, then I recommend compressing (in the client) it an storing into a
BLOB
. Think of it like a .jpg -- there's stuff in there, but SQL does not care.State your application; maybe we can be more specific.