How would you implement a hashtable in language x?

2019-01-30 04:23发布

站内文章 / 前沿技术

40 0

祖国的老花朵

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The point of this question is to collect a list of examples of hashtable implementations using arrays in different languages. It would also be nice if someone could throw in a pretty detailed overview of how they work, and what is happening with each example.

Edit:

Why not just use the built in hash functions in your specific language?

Because we should know how hash tables work and be able to implement them. This may not seem like a super important topic, but knowing how one of the most used data structures works seems pretty important to me. If this is to become the wikipedia of programming, then these are some of the types of questions that I will come here for. I'm not looking for a CS book to be written here. I could go pull Intro to Algorithms off the shelf and read up on the chapter on hash tables and get that type of info. More specifically what I am looking for are code examples. Not only for me in particular, but also for others who would maybe one day be searching for similar info and stumble across this page.

To be more specific: If you had to implement them, and could not use built-in functions, how would you do it?

You don't need to put the code here. Put it in pastebin and just link it.

回答1:

A hash table a data structure that allows lookup of items in constant time. It works by hashing a value and converting that value to an offset in an array. The concept of a hash table is fairly easy to understand, but implementing is obviously harder. I'm not pasting the whole hash table here, but here are some snippets of a hash table I made in C a few weeks ago...

One of the basics of creating a hash table is having a good hash function. I used the djb2 hash function in my hash table:

int ComputeHash(char* key)
{
    int hash = 5381;
    while (*key)
    hash = ((hash << 5) + hash) + *(key++);
    return hash % hashTable.totalBuckets;
}

Then comes the actual code itself for creating and managing the buckets in the table

typedef struct HashTable{
    HashTable* nextEntry;
    char*      key;
    char*      value;
}HashBucket;

typedef struct HashTableEntry{
    int totalBuckets;       // Total number of buckets allocated for the hash table
    HashTable** hashBucketArray; // Pointer to array of buckets
}HashTableEntry;
HashTableEntry hashTable;

bool InitHashTable(int totalBuckets)
{
    if(totalBuckets > 0)
    {
        hashTable.totalBuckets = totalBuckets;
        hashTable.hashBucketArray = (HashTable**)malloc(totalBuckets * sizeof(HashTable));
        if(hashTable.hashBucketArray != NULL)
        {
            memset(hashTable.hashBucketArray, 0, sizeof(HashTable) * totalBuckets);
            return true;
        }
    }
    return false;
}

bool AddNode(char* key, char* value)
{
    int offset = ComputeHash(key);
    if(hashTable.hashBucketArray[offset] == NULL)
    {
        hashTable.hashBucketArray[offset] = NewNode(key, value);
        if(hashTable.hashBucketArray[offset] != NULL)
            return true;
    }

    else
    {
        if(AppendLinkedNode(hashTable.hashBucketArray[offset], key, value) != NULL)
            return true;
    }
    return false;
}

HashTable* NewNode(char* key, char* value)
{
    HashTable* tmpNode = (HashTable*)malloc(sizeof(HashTable));
    if(tmpNode != NULL)
    {
        tmpNode->nextEntry = NULL;
        tmpNode->key   = (char*)malloc(strlen(key));
        tmpNode->value = (char*)malloc(strlen(value));

        strcpy(tmpNode->key, key);
        strcpy(tmpNode->value, value);
    }
    return tmpNode;
}

AppendLinkedNode finds the last node in the linked list and appends a new node to it.

The code would be used like this:

if(InitHashTable(100) == false) return -1;
AddNode("10", "TEN");

Finding a node is a simple as:

HashTable* FindNode(char* key)
{
    int offset = ComputeHash(key);
    HashTable* tmpNode = hashTable.hashBucketArray[offset];
    while(tmpNode != NULL)
    {
        if(strcmp(tmpNode->key, key) == 0)
            return tmpNode;
        tmpNode = tmpNode->nextEntry;
    }
    return NULL;
}

And is used as follows:

char* value = FindNode("10");

回答2:

A Java implementation in < 60 LoC

import java.util.ArrayList;
import java.util.List;
import java.util.Random;


public class HashTable {

    class KeyValuePair {

        Object key;

        Object value;

        public KeyValuePair(Object key, Object value) {
            this.key = key;
            this.value = value;
        }
    }

    private Object[] values;

    private int capacity;

    public HashTable(int capacity) {
        values = new Object[capacity];
        this.capacity = capacity;
    }

    private int hash(Object key) {
        return Math.abs(key.hashCode()) % capacity;
    }

    public void add(Object key, Object value) throws IllegalArgumentException {

        if (key == null || value == null)
            throw new IllegalArgumentException("key or value is null");

        int index = hash(key);

        List<KeyValuePair> list;
        if (values[index] == null) {
            list = new ArrayList<KeyValuePair>();
            values[index] = list;

        } else {
            // collision
            list = (List<KeyValuePair>) values[index];
        }

        list.add(new KeyValuePair(key, value));
    }

    public Object get(Object key) {
        List<KeyValuePair> list = (List<KeyValuePair>) values[hash(key)];
        if (list == null) {
            return null;
        }
        for (KeyValuePair kvp : list) {
            if (kvp.key.equals(key)) {
                return kvp.value;
            }
        }
        return null;
    }

    /**
     * Test
     */
    public static void main(String[] args) {

        HashTable ht = new HashTable(100);

        for (int i = 1; i <= 1000; i++) {
            ht.add("key" + i, "value" + i);
        }

        Random random = new Random();
        for (int i = 1; i <= 10; i++) {
            String key = "key" + random.nextInt(1000);
            System.out.println("ht.get(\"" + key + "\") = " + ht.get(key));
        }   
    }
}

回答3:

A HashTable is a really simple concept: it is an array in which key and value pairs are placed into, (like an associative array) by the following scheme:

A hash function hashes the key to a (hopefully) unused index into the array. the value is then placed into the array at that particular index.

Data retrieval is easy, as the index into the array can be calculated via the hash function, thus look up is ~ O(1).

A problem arises when a hash function maps 2 different keys to the same index...there are many ways of handling this which I will not detail here...

Hash tables are a fundamental way of storing and retrieving data quickly, and are "built in" in nearly all programming language libraries.

回答4:

I was looking for a completely portable C hash table implementation and became interested in how to implement one myself. After searching around a bit I found: Julienne Walker's The Art of Hashing which has some great tutorials on hashing and hash tables. Implementing them is a bit more complex than I thought but it was a great read.

回答5:

Here is my code for a Hash Table, implemented in Java. Suffers from a minor glitch- the key and value fields are not the same. Might edit that in the future.

public class HashTable
{
    private LinkedList[] hashArr=new LinkedList[128];
    public static int HFunc(int key)
    {
        return key%128;
    }


    public boolean Put(Val V)
    {

        int hashval = HFunc(V.getKey());
        LinkedNode ln = new LinkedNode(V,null);
        hashArr[hashval].Insert(ln);
        System.out.println("Inserted!");
        return true;            
    }

    public boolean Find(Val V)
    {
        int hashval = HFunc(V.getKey());
        if (hashArr[hashval].getInitial()!=null && hashArr[hashval].search(V)==true)
        {
            System.out.println("Found!!");
            return true;
        }
        else
        {
            System.out.println("Not Found!!");
            return false;
        }

    }
    public boolean delete(Val v)
    {
        int hashval = HFunc(v.getKey());
        if (hashArr[hashval].getInitial()!=null && hashArr[hashval].delete(v)==true)
        {
            System.out.println("Deleted!!");
            return true;
        }
        else 
        {
            System.out.println("Could not be found. How can it be deleted?");
            return false;
        }
    }

    public HashTable()
    {
        for(int i=0; i<hashArr.length;i++)
            hashArr[i]=new LinkedList();
    }

}

class Val
{
    private int key;
    private int val;
    public int getKey()
    {
        return key;
    }
    public void setKey(int k)
    {
        this.key=k;
    }
    public int getVal()
    {
        return val;
    }
    public void setVal(int v)
    {
        this.val=v;
    }
    public Val(int key,int value)
    {
        this.key=key;
        this.val=value;
    }
    public boolean equals(Val v1)
    {
        if (v1.getVal()==this.val)
        {
            //System.out.println("hello im here");
            return true;
        }
        else 
            return false;
    }
}

class LinkedNode
{
    private LinkedNode next;
    private Val obj;
    public LinkedNode(Val v,LinkedNode next)
    {
        this.obj=v;
        this.next=next;
    }
    public LinkedNode()
    {
        this.obj=null;
        this.next=null;
    }
    public Val getObj()
    {
        return this.obj;    
    }
    public void setNext(LinkedNode ln)
    {
        this.next = ln;
    }

    public LinkedNode getNext()
    {
        return this.next;
    }
    public boolean equals(LinkedNode ln1, LinkedNode ln2)
    {
        if (ln1.getObj().equals(ln2.getObj()))
        {
            return true;
        }
        else 
            return false;

    }

}

class LinkedList
{
    private LinkedNode initial;
    public LinkedList()
    {
        this.initial=null;
    }
    public LinkedList(LinkedNode initial)
    {
        this.initial = initial;
    }
    public LinkedNode getInitial()
    {
        return this.initial;
    }
    public void Insert(LinkedNode ln)
    {
        LinkedNode temp = this.initial;
        this.initial = ln;
        ln.setNext(temp);
    }
    public boolean search(Val v)
    {
        if (this.initial==null)
            return false;
        else 
        {
            LinkedNode temp = this.initial;
            while (temp!=null)
            {
                //System.out.println("encountered one!");
                if (temp.getObj().equals(v))
                    return true;
                else 
                    temp=temp.getNext();
            }
            return false;
        }

    }
    public boolean delete(Val v)
    {
        if (this.initial==null)
            return false;
        else 
        {
            LinkedNode prev = this.initial;
            if (prev.getObj().equals(v))
            {
                this.initial = null;
                return true;
            }
            else
            {
                LinkedNode temp = this.initial.getNext();
            while (temp!=null)
            {
                if (temp.getObj().equals(v))
                {
                    prev.setNext(temp.getNext());
                    return true;
                }
                else
                {
                    prev=temp;
                    temp=temp.getNext();
                }
            }
            return false;
            }
        }
    }
}

回答6:

I think you need to be a little more specific. There are several variations on hashtables with regards to the following options

Is the hashtable fixed-size or dynamic?
What type of hash function is used?
Are there any performance constraints when the hashtable is resized?

The list can go on and on. Each of these constraints could lead to multiple implementations in any language.

Personally, I would just use the built-in hashtable that is available in my language of choice. The only reason I would even consider implementing my own would be due to performance issues, and even then it is difficult to beat most existing implementations.

回答7:

For those who may use the code above, note the memory leak:

tmpNode->key   = (char*)malloc(strlen(key)+1);   //must have +1 for '\0'
tmpNode->value = (char*)malloc(strlen(value)+1); //must have +1
strcpy(tmpNode->key, key);
strcpy(tmpNode->value, value);

And to complete the code:

HashNode* AppendLinkedNode(HashNode* ptr, char* key, char* value)
{
    //follow pointer till end
    while (ptr->nextEntry!=NULL)
    {
        if (strcmp(ptr->value,value)==0) return NULL;
        ptr=ptr->nextEntry;
    }
    ptr->nextEntry=newNode(key,value);
    return ptr->nextEntry;
}

回答8:

Minimal implementation in F# as a function that builds a hash table from a given sequence of key-value pairs and returns a function that searches the hash table for the value associated with the given key:

> let hashtbl xs =
    let spine = [|for _ in xs -> ResizeArray()|]
    let f k = spine.[abs(k.GetHashCode()) % spine.Length]
    Seq.iter (fun (k, v) -> (f k).Add (k, v)) xs
    fun k -> Seq.pick (fun (k', v) -> if k=k' then Some v else None) (f k);;
val hashtbl : seq<'a * 'b> -> ('a -> 'b) when 'a : equality

回答9:

I went and read some of the Wikipedia-page on hashing: http://en.wikipedia.org/wiki/Hash_table. It seems like a lot of work, to put up code for a hashtable here, especially since most languages I use allready have them built in. Why would you want implementations here? This stuff really belongs in a languages library.

Please elaborate on what your expected solutions should include:

hash function
variable bucket count
collision behavior

Also state what the purpose of collecting them here is. Any serious implementation will easily be quite a mouthfull = this will lead to very long answers (possibly a few pages long each). You might also be enticing people to copy code from a library...