I'm currently researching MongoDb as a possible database option, and I'm having trouble dealing with Guid serialization. I thought at first maybe this was a bug in the C# driver's serialization, but now I think it's more likely a naive assumption on my part.
To help me convert the Bson base64 representations back and forth to Guids, I wrote a couple of little powershell functions to help:
function base64toguid
{
param($str);
$b = [System.Convert]::FromBase64String($str);
$hex = "";
foreach ($x in $b) {
$hex += $x.ToString("x2");
}
$g = new-object -TypeName System.Guid -ArgumentList $hex;
return $g;
}
function guidtobase64
{
param($str);
$g = new-object -TypeName System.Guid -ArgumentList $str;
$b64 = [System.Convert]::ToBase64String($g.ToByteArray());
return $b64;
}
An example of the issue I'm having:
:) guidtobase64("53E32701-9863-DE11-BD66-0015178A5E3C");
ASfjU2OYEd69ZgAVF4pePA==
:) base64toguid("ASfjU2OYEd69ZgAVF4pePA==");
Guid
----
0127e353-6398-11de-bd66-0015178a5e3c
And from the mongo shell:
:) mongo
MongoDB shell version: 1.6.5
connecting to: test
> b = new BinData(3, "ASfjU2OYEd69ZgAVF4pePA==");
BinData(3,"ASfjU2OYEd69ZgAVF4pePA==")
> b.hex();
127e353639811debd66015178a5e3c
>
So as you can see, the Guid I get back doesn't match what I put in. My function and hex() return the same thing. If you compare the original to the result:
53E32701-9863-DE11-BD66-0015178A5E3C
0127e353-6398-11de-bd66-0015178a5e3c
You can see that the first 3 sets of hex pairs are reversed, but the last 2 sets are not. This makes me think there is something about Guid.ToString() that I don't understand.
Can anyone educate me please?
The order of bytes in a GUID are not the same as the order in their ToString()
representation on little-endian systems.
You should use guid.ToByteArray() rather than using ToString().
And, you should use new Guid(byte[] b)
to construct it, rather than $str
.
To express this in pure C#:
public string GuidToBase64(Guid guid)
{
return System.Convert.ToBase64String(guid.ToByteArray()); // Very similar to what you have.
}
public Guid Base64Toguid(string base64)
{
var bytes = System.Convert.FromBase64String(base64);
return new Guid(bytes); // Not that I'm not building up a string to represent the GUID.
}
Take a look at the "Basic Structure" section of the GUID article on Wikipedia for more details.
You will see that most of the data is stored in "Native" endianness... which is where the confusion is coming from.
To quote:
Data4 stores the bytes in the same order as displayed in the GUID text encoding (see below), but the other three fields are reversed on little-endian systems (for example Intel CPUs).
Edit:
Powershell version:
function base64toguid
{
param($str);
$b = [System.Convert]::FromBase64String($str);
$g = new-object -TypeName System.Guid -ArgumentList (,$b);
return $g;
}
As an additional caveat, you can optionally trim the "==" off of the end of your string, since it is just padding (which may help if you are trying to save space).
You need to call the Guid constructor that takes a byte array. There's special syntax needed for that in Powershell - if you just pass $b it will tell you it can't find a constructor that takes 16 arguments, so you have to wrap the byte array in another array:
$g = new-object -TypeName System.Guid -ArgumentList (,$b)
Looking at the c-sharp driver documentation on the mongo website, it turns out that there is an implicit conversion provided for System.Guid.
So in c# (sorry, my powershell is a little rusty), you would just write:
Guid g = Guid.NewGuid(); //or however your Guid is initialized
BsonValue b = g;
I imagine that the reverse will probably also work:
BsonValue b = // obtained this from somewhere
Guid g = b;
If you have no specific need to serialize the Guid as base64, then converting directly to binary will be much less work (note that there will be no endian issues, for example). Also, the data will be stored in binary on the server, so it will be more space efficient than using base64.