可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm currently using MD5 hashes but I would like to find something that will create a shorter hash that uses just [a-z][A-Z][0-9]. It only needs to be around 5-10 characters long.
Is there something out there that already does this?
Update:
I like the CRC32 hash. Is there a clean way of calculating it in .NET?
Update2:
I'm using the CRC32 function from the link Joe provided. How can I convert the uInt into the characters defined above?
回答1:
.NET string object has a GetHashCode() function. It returns an integer.
Convert it into a hex and then to an 8 characters long string.
Like so:
string hashCode = String.Format("{0:X}", sourceString.GetHashCode());
More on that: http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx
UPDATE: Added the remarks from the link above to this answer:
The behavior of GetHashCode is dependent on its implementation, which
might change from one version of the common language runtime to
another. A reason why this might happen is to improve the performance
of GetHashCode.
If two string objects are equal, the GetHashCode method returns
identical values. However, there is not a unique hash code value for
each unique string value. Different strings can return the same hash
code.
Notes to Callers
The value returned by GetHashCode is platform-dependent. It differs on
the 32-bit and 64-bit versions of the .NET Framework.
回答2:
Is your goal to create a URL shortener or to create a hash function?
If your goal is to create a URL shortener, then you don't need a hash function. In that case, you just want to pre generate a sequence of cryptographically secure random numbers, and then assign each url to be encoded a unique number from the sequence.
You can do this using code like:
using System.Security.Cryptography;
const int numberOfNumbersNeeded = 100;
const int numberOfBytesNeeded = 8;
var randomGen = RandomNumberGenerator.Create();
for (int i = 0; i < numberOfNumbersNeeded; ++i)
{
var bytes = new Byte[numberOfBytesNeeded];
randomGen.GetBytes(bytes);
}
Using the cryptographic number generator will make it very difficult for people to predict the strings you generate, which I assume is important to you.
You can then convert the 8 byte random number into a string using the chars in your alphabet. This is basically a change of base calculation (from base 256 to base 62).
回答3:
I dont think URL shortening services use hashes, I think they just have a running alphanumerical string that is increased with every new URL and stored in a database.
If you really need to use a hash function have a look at this link: some hash functions
Also, a bit offtopic but depending on what you are working on this might be interesting: Coding Horror article
回答4:
Just take a Base36 (case-insensitive) or Base64 of the ID of the entry.
So, lets say I wanted to use Base36:
(ID - Base36)
1 - 1
2 - 2
3 - 3
10 - A
11 - B
12 - C
...
10000 - 7PS
22000 - GZ4
34000 - Q8C
...
1000000 - LFLS
2345000 - 1E9EW
6000000 - 3KLMO
You could keep these even shorter if you went with base64 but then the URL's would be case-sensitive. You can see you still get your nice, neat alphanumeric key and with a guarantee that there will be no collisions!
回答5:
You cannot use a short hash as you need a one-to-one mapping from the short version to the actual value. For a short hash the chance for a collision would be far too high. Normal, long hashes, would not be very user-friendly (and even though the chance for a collision would probably be small enough then, it still wouldn't feel "right" to me).
TinyURL.com seems to use an incremented number that is converted to Base 36 (0-9, A-Z).
回答6:
You can decrease the number of characters from the MD5 hash by encoding them as alphanumerics. Each MD5 character is usually represented as hex, so that's 16 possible values. [a-zA-Z0-9] includes 62 possible values, so you could encode each value by taking 4 MD5 values.
EDIT:
here's a function that takes a number ( 4 hex digits long ) and returns [0-9a-zA-Z]. This should give you an idea of how to implement it. Note that there may be some issues with the types; I didn't test this code.
char num2char( unsigned int x ){
if( x < 26 ) return (char)('a' + (int)x);
if( x < 52 ) return (char)('A' + (int)x - 26);
if( x < 62 ) return (char)('0' + (int)x - 52);
if( x == 62 ) return '0';
if( x == 63 ) return '1';
}
回答7:
First I get a list of random distinct numbers. Then I select each char
from base string, append and return result. I'm selecting 5 chars, that will amount to 6471002 permutations out of base 62. Second part is to check against db to see if any exists, if not save short url.
const string BaseUrlChars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
private static string ShortUrl
{
get
{
const int numberOfCharsToSelect = 5;
int maxNumber = BaseUrlChars.Length;
var rnd = new Random();
var numList = new List<int>();
for (int i = 0; i < numberOfCharsToSelect; i++)
numList.Add(rnd.Next(maxNumber));
return numList.Aggregate(string.Empty, (current, num) => current + BaseUrlChars.Substring(num, 1));
}
}
回答8:
You can use CRC32, it is 8 bytes long and similar to MD5. Unique values will be supported by adding timestamp to actual value.
So its will look like http://foo.bar/abcdefg12.
回答9:
If you're looking for a library that generates tiny unique hashes from inters, I can highly recommend http://hashids.org/net/. I use it in many projects and it works fantastically. You can also specify your own character set for custom hashes.
回答10:
If you don't care about cryptographic strength, any of the CRC functions will do.
Wikipedia lists a bunch of different hash functions, including length of output. Converting their output to [a-z][A-Z][0-9] is trivial.
回答11:
You could encode your md5 hash code with base64 instead of hexadecimal, this way you get a shorter url using exactly the characters [a-z][A-Z][0-9].
回答12:
There's a wonderful but ancient program called btoa
which converts binary to ASCII using upper- and lower-case letters, digits, and two additional characters. There's also the MIME base64 encoding; most Linux systems probably have a program called base64
or base64encode
. Either one would give you a short, readable string from a 32-bit CRC.
回答13:
You could take the first alphanumeric 5-10 characters of the MD5 hash.