Currently our .NET application constructs XML data in memory that we persist to a SQL Server database. The XElement object is converted to a string using ToString() and then stored in a varchar(MAX) column in the DB. We dind't want to use the SQL XML datatype as we didn't need any validation and SQL doesn't need to query the XML at any stage.
Although this implementation works fine, we want to reduce the size of the database by compressing the XML before storing it, and decompressing it after retrieving it. Does anyone have any sample code for compressing an XElement object (and decompressing would be great too)? Also, what changes would I need to make to the data type of the database column so that we can fully take advantage of this compression?
I have investigated again the XML datatype SQL Server 2005 offers, and the validation overhead it offers is too high for us to consider using it. Also, although it does compress the XML somewhat, it doesn't as much compression as the .NET DeflateStream class.
I have tested the DeflateStream class by writing the XML we use to disk, and then saving the comrpessed version as a new file. The results are great, a 16kb file goes down to a 3kb file, so it's jsut a case of getting this to work in memory and saving the resulting data to the DB. Does anyone have any sample code to do the compression, and should I change the varcahr(MAX) colum to type to maybe varbinary?
Thanks in advance
I know you tagged the question SQL 2005, but you should consider upgrading to SQL 2008 and using the wonderful new compression capabilities that come with it. Is out-of-the-box, transparent for your application and will save you a huge implementation/test/support cost.
I think you should also re-test the XML column. It stores in binary, I know, not as text. It could be smaller, and may not perform badly, even if you don't actually need the additional features.
Besides possibly compressing the string itself (perhaps using LBushkin's Base64 method above), you probably want to start with making sure you kill all the whitespace. The default XElement.ToString() method saves the element with "indenting". You need to use the ToString(SaveOptions options) method (using SaveOptions.DisableFormatting) if you want to make sure you've just got the tags and data.
This article may help you get a start.
The following snippet can compress a string and return a base-64 coded result:
EDIT: As an aside, you may want to use CLOB formats even when storing XML as text because varchars have a very limited length - which XML can often quickly exceed.