c string compare vs hash compare

2019-04-11 05:24发布

I need to compare a string to multiple other constant strings in c. I am curious which is faster, to hash the string I am going to compare and compare it to all the other constant string hashes or just compare the strings as strings. thank you in advance

thank you for the answers I am going to be doing many comparisons. can anyone give me a good, fast, low resource intensive algorithm to use? The only hash I know of is MD5 and I have a feeling that is over kill.

I also want to add that the strings are maybe 20 or 30 characters long at the max with most being around 7.

10条回答
贪生不怕死
2楼-- · 2019-04-11 05:48

thank you for the answers I am going to be doing many comparisons. can anyone give me a good, fast, low resource intensive algorithm to use? The only hash I know of is MD5 and I have a feeling that is over kill.

Murmur hash is simple, fast and behaves well on statistical tests.

查看更多
Deceive 欺骗
3楼-- · 2019-04-11 05:50

If you are trying to match a subject string against a set of other strings, you might consider using the Aho-Corasick String Matching Algorithm. It uses a trie to match the subject against all of the target strings in a single pass (it's also quite simple to implement).

查看更多
劫难
4楼-- · 2019-04-11 05:55

Equality of a hash value does not guarantee equality - a mismatch will guarantee inequality, though. If you're going to need to compare a lot of strings against your collection the a hash would be great - if it's a one-off comparison (unlikely I guess) then strcmp will do nicely.

查看更多
男人必须洒脱
5楼-- · 2019-04-11 05:56

It greatly depends on the length of the strings and the complexity of your hash function. Implement and benchmark yourself would be the best answer...

查看更多
贪生不怕死
6楼-- · 2019-04-11 05:57

If your constant strings are known at compile time, take a look at the idea of a "perfect hash".

Wikipedia: A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no collisions.

That "no collisions" thing saves you work. Possibilities for further reading and implementations are:

查看更多
Animai°情兽
7楼-- · 2019-04-11 06:00

I think if you have a static list of strings, I would store them in a sorted array and then use bsearch to determine if a string is in that list. This returns NULL if it does not exist, or a pointer to the value should it exist and is probably faster than a linear search or hashing.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* cmp function for qsort and bsearch */
static int pstrcmp(const void *a, const void *b)
{
  return strcmp(*(char * const *)a, *(char * const *)b);
}

/* check an input against the list of known strings */
static char *check_for_match(char *input)
{
  static char *static_list[] = { "one", "two", "three", "four", "five" };
  static int nelems;

  /* this sorts the list, for demonstration purposes, but if the list
     is static then it could be sorted prior to compiling */
  if (! nelems)
  {
    nelems = sizeof(static_list) / sizeof(*static_list);
    qsort(static_list, nelems, sizeof(*static_list), pstrcmp);
  }


  return bsearch(&input, static_list, nelems, sizeof(*static_list), pstrcmp);
}

int main(int argc, char *argv[])
{
  if (check_for_match("should_not_match"))
  {
    printf("Match found.\n");
  } else {
    printf("No match found.\n");
  }

  if (check_for_match("two"))
  {
    printf("Match found.\n");
  } else {
    printf("No match found.\n");
  }
  return EXIT_SUCCESS;
}
查看更多
登录 后发表回答