Itroduction
I'm currently working on John Conway's Game of Life in js. I have the game working (view here) and i'm working on extra functionalities such as sharing your "grid / game" to your friends. To do this i'm extracting the value's of the grid (if the cell is alive or dead) into a long string of 0's and 1's.
This string has a variable length since the grid is not always the same size. for example:
grid 1 has a length and width of 30 => so the string's length is 900
grid 2 has a length and width of 50 => so the string's length is 2500
The problem
As you can see these string's of 0's and 1's are way too long to copy around and share.
However hard i try I don't seem to be able to come up with a code that would compress a string this long to a easy to handle one.
Any ideas on how to compress (and decompress) this?
I have considered simply writing down every possible grid option for the gird sizes 1x1 to 100x100 and giving them a key/reference to use as sharable code. Doing that by hand would be madness but maybe any of you has an idea on how to create an algorithm that can do this?
If we make the assumption than the grid contains much more 0's than 1's, you may want to try this simple compression scheme:
Below is an example with a 16x16 grid:
This will produce the string "Z02z07z02ZZ380838z38ZZz8z14z08Zz8Zz".
The unpacking process is doing the exact opposite:
Note that this code will only work correctly if the length of the input string is a multiple of 4. If it's not the case, you'll have to pad the input and crop the output.
EDIT : 2nd method
If the input is completely random -- with roughly as many 0's as 1's and no specific repeating patterns -- the best you can do is probably to convert the binary string to a BASE64 string. It will be significantly shorter (this time with a fixed compression ratio of about 17%) and can still be copied/pasted by the user.
Packing:
Will produce the string "66ML4aVaDd/+VXDhtTF/lH2lD2fD3FcPZs2MiaiDPAA=".
Unpacking:
Or if you want to go a step further, you can consider using something like base91 instead, for a reduced encoding overhead.
LZ-string
Using LZ-string I was able to compress the "code" quite a bit.
By simply compressing it to base64 like this:
var compressed = LZString.compressToBase64(string)
Decompressing is also just as simple as this:
var decompressed = LZString.decompressFromBase64(compressed)
However the length of this compressed string is still pretty long given that you have about as many 0s as 1s (not given in the example)
example
But the compression does work.
ANSWER
For any of you who are wondering how exactly I ended up doing it, here's how:
First I made sure every string passed in would be padded with leading 0s untill it was devidable by 8. (saving the amount of 0s used to pad, since they're needed while decompressing)
I used Corstian's answer and functions to compress my string (interpreted as binary) into a hexadecimal string. Although i had to make one slight alteration.
Not every binary substring with a lenght of 8 will return exactly 2 hex characters. so for those cases i ended up just adding a 0 in front of the substring. The hex substring will have the same value but it's length will now be 2.
Next up i used a functionality from Arnaulds answer. Taking every double character and replacing it with a single character (one not used in the hexadecimal alphabet to avoid conflict). I did this twice for every hexadecimal character.
Since most grids are gonna have more dead cells then alive ones, I made sure the 0s would be able to compress even further, using Arnaulds method again but going a step further.
This resulted in
Z
representing 4096 (binary) 0sThe last step of the compression was adding the amount of leading 0s in front of the compressed string, so we can shave those off at the end of decompressing. This is how the returned string looks in the end.
Decompressing is practically doing everything the other way around.
Firstly splitting the number that represents how many leading 0s we've used as padding from the compressed string.
Then using Arnaulds functionality, turning the further "compressed" characters back into hexadecimal code.
Taking this hex string and turning it back into binary code. Making sure, as Corstian pointed out, that every binary substring will have a length of 8. (ifnot we pad the substrings with leading 0s untill the do, exactly, have a length of 8)
And then the last step is to shave off the leading 0s we've used as padding to make the begin string devidable by 8.
The functions
Function I use to compress:
The function I use to decompress:
URL shortener
I still found the "compressed" strings a little long for sharing / pasting around. So i used a simple URL shortener (view here) to make this process a little easier for the user.
Now you might ask, then why did you need to compress this string anyway?
Here's why:
First of all, my project is hosted on github pages (gh-pages). The info page of gh-pages tells us that the url can't be any longer than 2000 chars. This would mean that the max grid size would be the square root of 2000 - length of the base url, which isn't that big. By using this "compression" we are able to share much larger grids.
Now the second reason why is that, it's a challange. I find dealing with problems like these fun and also helpfull since you learn a lot.
Live
You can view the live version of my project here. and/or find the github repository here.
Thankyou
I want to thank everyone who helped me with this problem. Especially Corstian and Arnauld, since i ended up using their answers to reach my final functions.
Sooooo.... thanks guys! apriciate it!
In case it wasn't already obvious, the string you're trying to store looks like a binary string.
Counting systems
Binary is a number in base-2. This essentially means that there are two characters being used to keep count. Normally we are used to count with base-10 (decimal characters). In computer science the hexadecimal system (base-16) is also widely being used.
Since you're not storing the bits as bits but as bytes (use
var a = 0b1100001;
if you ever wish to store them like bits) the 'binary' you wish to store just takes as much space as any other random string with the same length.Since you're using the binary system each position just has 2 possible values. When using the hexadecimal value a single position can hold up to 16 possible values. This is already a big improvement when it comes to storing the data compactly. As an example
0b11111111
and0xff
both represents the decimal number 255.In your situation that'd shave 6 bytes of every 8 bytes you have to store. In the end you'd be stuck with a string just 1/4th of the length of the original string.
Javascript implementation
Essentially what we want to do is to interpret the string you store as binary and retrieve the hexadecimal value. Luckily JavaScript has built in functionality to achieve stuff like this:
We're saying "parse this string and interpret it like a base-2 number and store it as a hexadecimal string".
Decoding is practically the same. Replace all occurrences of the number 8 in the example above with 2 and vice versa.
Please note
A prerequisite for this code to work correctly is that the binary length is dividable by 8. See the following example:
When decoding you should add leading zeros until you have a full byte (8 bits).
In case the positions you have to store are not dividable by 8 you can, for example, add padding and add a number to the front of the output string to identify how much positions to strip.
Wait, there's more
To get even shorter strings you can build a lookup table with 265 characters in which you search for the character associated with the specific position. (This works because you're still storing the hexadecimal value as a string.) Sadly neither the ASCII nor the UTF-8 encodings are suited for this as there are blocks with values which have no characters defined.
It may look like:
This way you can have 265 possible values at a single position, AND you have used your byte to the fullest.
Please note that no real compression is happening here. We're rather utilising data types to be used more efficiently for your current use case.