I have to write my own hash function. If I wanted to just make the simple hash function that maps each letter in the string to a numerical value (i.e. a=1, b=2, c=3, ...), is there a way I can perform this hash on a string without having to first convert it to a c-string to look at each individual char? Is there a more efficient way of hashing strings?
问题:
回答1:
Re the first question, sure, e.g, something like:
int hash = 0;
int offset = 'a' - 1;
for(string::const_iterator it=s.begin(); it!=s.end(); ++it) {
hash = hash << 1 | (*it - offset);
}
regarding the second, there are many better ways to hash strings. E.g., see here for a few C examples (easily translatable to C++ along the lines of the snippet above).
回答2:
From personal experience I know that this works and produces good distributions. (Plagiarised from http://www.cse.yorku.ca/~oz/hash.html):
djb2
this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.
unsigned long hash(unsigned char *str) {
unsigned long hash = 5381;
int c;
while (c = *str++) {
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash;
}
回答3:
You can examine each individual char from a std::string using the []
operator. However, you can look at Boost::Functional/Hash for guidance on a better hashing scheme. There is also a list of hashing functions in c located here.
回答4:
Here's a C (++) hash function that I found in Stroustrup's book:
int hash(const char *str)
{
int h = 0;
while (*str)
h = h << 1 ^ *str++;
return h;
}
If you're using it for a hash table (which Stroustrup does) then you can instead return the abs of the hash modulo a prime number. So instead
return (h > 0 ? h : -h) % N_BUCKETS;
for the last line.
回答5:
C++11 ships with a standard hashing function for strings.
https://en.cppreference.com/w/cpp/string/basic_string/hash
#include <string>
#include<functional> // hash
int main(){
std::string s = "Hello";
std::size_t hash = std::hash<std::string>{}(s);
}
回答6:
xor the characters together, four at a time.
回答7:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
// a variation on dan bernstein's algorithm
// [http://www.cse.yorku.ca/~oz/hash.html]
template<typename Int>
struct hash {
hash() : acc(5381) { }
template<typename Ch>
void operator()(Ch ch) { acc = ((acc << 5) + acc) ^ ch; }
operator Int() const { return acc; }
Int acc;
};
int main(int argc, char* argv[])
{
string s("Hellp, world");
cout << hex << showbase
<< for_each(s.begin(), s.end(), hash<unsigned long long>()) << '\n';
return 0;
}
回答8:
Another way for small strings:
int hash(const char* str) {
int hash = 0;
int c = 0;
while (c < std::strlen(str)) {
hash += (int)str[c] << (int)str[c+1];
c++;
}
return hash;
}
回答9:
You can make use of the member functions operator[] or at of the string class or iterators to access individual char of a string object without converting it to c-style char array.
To hash a string object to an integer you'll have to access each individual char of the string object which you can do as:
for (i=0; i < str.length(); i++) {
// use str[i] or str.at(i) to access ith element.
}