Efficient recent set-membership tests in C for ded

2019-06-11 02:01发布

I have an unlimited number of 12 byte messages arriving. The content can be treated as random and structureless. (The length is only important because it is shorter than most hashes.)

I want to deduplicate them.

One method is to store the last 1,000 messages in a circular buffer and check all 1,000 message for a match before accepting a message (and inserting it into the circular buffer for future checks).

What other methods are there, which could be more CPU and memory efficient?

标签: c duplicates set
1条回答
forever°为你锁心
2楼-- · 2019-06-11 02:26

12 Bytes seem quite small length. You can cast your byte array to string and then use a tree structure based on strings, via exploiting strcmp().

  1. Way to cast byte array to string

  2. Tree structure based on strings

Unless you form a skewed tree, O(logn) would be your worst case for deduplication. In such case, it is not hard to change to a self-balancing tree too.

Here my BST implementation using string type keys:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


struct Node
{
  char *key;
  char *value;
  struct Node *left;
  struct Node *right;
};

struct Node* newNode(char *strKey,char *strValue)
{
  struct Node *tmp = (struct Node*) malloc(sizeof(struct Node));
  tmp->key = strdup(strKey);
  tmp->value = strdup(strValue);
  tmp->left = NULL;
  tmp->right = NULL;
  return tmp;
}

struct Node* insert(struct Node* node, char *newKey, char *newValue)
{
  if (node == NULL)
    return newNode(newKey,newValue);

  int comparison = strcmp(newKey,node->key); 
  if (comparison < 0)
    node->left  = insert(node->left, newKey, newValue);
  else if (comparison > 0)
    node->right = insert(node->right, newKey, newValue);  
  else
  {
    printf("Error occured while insert to BST\n");
    return NULL;
  }

  return node;
}

struct Node* deleteNode(struct Node* node, char *key2del)
{
  if (node == NULL)
    return node;

  int comparison = strcmp(key2del,node->key);
  if (comparison < 0)
    node->left = deleteNode(node->left, key2del);
  else if (comparison > 0)
    node->right = deleteNode(node->right, key2del);
  else // where deletion occurs
  {
    if (node->left == NULL)
    {
      struct Node *tmp = node->right;
      free(node);
      return tmp;
    }
    else if (node->right == NULL)
    {
      struct Node *tmp = node->left;
      free(node);
      return tmp;
    }

    struct Node *tmp = node->right;
    while(tmp->left != NULL)
      tmp = tmp->left;

    node->key = tmp->key;
    node->right = deleteNode(node->right, tmp->key);
  }

  return node;
}
查看更多
登录 后发表回答