How would you go about designing a function for a

The domain of interest is string matching. Assume I have a structure like this.

typedef struct
{
    char *name,
    int (*function)();

} StringArray

StringArray s[] = 
{
    {"George", func1},
    {"Paul",   func2},
    {"Ringo",  func3},
    {"John",   func4},
    {"",       NULL}   /* End of list */ 
}

There are a fixed number of strings in the array. They are hard coded as in the example. If the table changes, there would be a need to re-evaluate the quality of the hash function.

I want to apply a hash function to a string, and if the string matches one in the array, then call the function. A perfect hash function is needed for this. No collisions are allowed.The purpose of requiring hashing is to get O(1) performance on the lookup.

What ideas do you have on designing a function to do this?

标签： c++ c string function hash

8条回答

贼婆χ

2楼-- · 2019-01-22 22:38

Well, there is no perfect hash function.

You have several that minimize collisions, but none eliminates them.

Can't advise one though :P

EDIT: The solution can't be finding a perfect hash function. The solution is to be aware of collisions. Generically a hash function has collisions. That obviously depends on the dataset and the size of the resulting hash code.

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2019-01-22 22:40

The summary lists both C and C++. Which of them are you looking for? C and C++ are two distinct languages, and differ greatly in their string handling and data structures (and the fact that the C ones work in C++ doesn't change that).

Why, specifically, do you want a perfect hash function? Is it that you want to associate a string with a function, and thought that would be a good way to do it? Is this some sort of homework assignment? Do you have a reason not to use map<> in C++? (Or unordered_map<> if available?)

If you do need a perfect hash, what are the constraints on the strings? Will there be a certain fixed set you want to dispatch on? What about strings that don't match one of the set? Are you willing to accept hits from random strings, or is the number of incoming strings limited?

If you could edit your question to include information like that, we could be a lot more helpful.

EDIT (in response to the first two comments):

OK, we should look at C solutions, since you presumably want this for both your C and C++ work. You presumably want the performance, but have you tested? If we're dealing with strings coming in on the I/O system, the time there is likely to dwarf the dispatch time.

You are expecting arbitrary strings. It's a bit much to expect a perfect hash function that will avoid all collisions from random data, so you do need to consider that.

Have you considered a trie? It may be more efficient than a perfect hash function (or may not be), it should be fairly easy to implement in C, and it will avoid problems with redoing your list of dispatching strings or possible collisions.

0人赞添加讨论(0) 举报

贼婆χ

4楼-- · 2019-01-22 22:44

The final result of this exercise was to

Steal a number of string oriented hash functions off the net.
Build a sort of factory class which tested each of the functions against the data set with a range of mod operator values, looking for the smallest perfect hash that works with that function.
That factory class default constructor returns a string, which represents a set of arguments that when use picks the correct hash function, and mod size to give the perfect hash that requires the least amount of memory.
under normal usage, you simply instantiate the class with the returned arguments, and the class puts itself into a working state with the desired functions.
That constructor validates that there are no collisions and aborts if there are.
In the case where no perfect hash is found, it degrades into a binary search across a sorted version of the input table.

For the set of arrays I have in my domain, this seems to work very very well. A possible future optimization would be to do the same sort of testing, on substrings of the input. In the sample case, the first letter of each musicians name is sufficient to tell them apart. Then one would need to balance the cost of the actual hash function against the memory used.

My thanks to everyone who contributed ideas.

Evil

0人赞添加讨论(0) 举报

成全新的幸福

5楼-- · 2019-01-22 22:47

If collisions are absolutely not allowed, your only option is to keep track of every string in the database, which is probably not a best way to go.

What I would do is apply one of existing common strong hashing algorithms, such as: MD5 or SHA. There's miriads of samples all around, here's one for example: http://www.codeproject.com/KB/security/cryptest.aspx

0人赞添加讨论(0) 举报

Lonely孤独者°

6楼-- · 2019-01-22 22:49

You can use map

std::string foo() { return "Foo"; }
std::string bar() { return "Bar"; }

int main()
{
   std::map<std::string, std::string (*)()> m;
   m["foo"] = &foo;
   m["bar"] = &bar; 
}

0人赞添加讨论(0) 举报

劫难

7楼-- · 2019-01-22 22:53

Use a balanced binary tree. Then you KNOW behaviour is ALWAYS O(logn).

I strongly dislike hashes. People don't realise how much risk they take with their algorithm. They run some test data and then deploy in the field. I have NEVER seen a deployed hash algorithm get checked for behaviour in the field.

O(log n) is almost always acceptable in lieu of O(1).

0人赞添加讨论(0) 举报

1 2 下一页

How would you go about designing a function for a

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间