ANSI C hash table implementation with data in one

2019-03-31 19:45发布

I am looking for an open source C implementation of a hash table that keeps all the data in one memory block, so it can be easily send over a network let say. I can only find ones that allocate small pieces of memory for every key-value pair added to it.

Thank you very much in advance for all the inputs.

EDIT: It doesn't necessarily need to be a hash table, whatever key-value pair table would probably do.

4条回答
\"骚年 ilove
2楼-- · 2019-03-31 20:21

On a unix system I'd probably utilise a shared memory buffer (see shm_open()), or if that's not available a memory-mapped file with the MAP_SHARED flag, see the OS-specific differences though http://en.wikipedia.org/wiki/Mmap

If both shm_open and mmap aren't available you could still use a file on the disk (to some extent), you'd have to care about the proper locking, I'd send an unlock signal to the next process and maybe the seek of the updated portion of the file, then that process locks the file again, seeks to the interesting part and proceeds as usual (updates/deletes/etc.).

In any case, you could freely design the layout of the hashtable or whatever you want, like having fixed width key/seek pairs. That way you'd have the fast access to the keys of your hashtable and if necessary you seek to the data portion, then copy/delete/modify/etc.

Ideally this file should be on a ram disk, of course.

查看更多
狗以群分
3楼-- · 2019-03-31 20:21

Libraries providing hashtables tend to hide the details and make the thing work efficiently (that is normally what programmers want when they use an hashtabe), so normally the way they handle the memory is hidden from the final programmer's eyes, and programmers shouldn't rely on the particular "memory layout", that may change in following version of the library.

Write your own function to serialize (and unserialize) the hashtable in the most convenient way for your usage. You can keep the serialized content if you need it several times (of course, when the hashtable is changed, you need to update the serialized "version" kept in memory).

查看更多
Explosion°爆炸
4楼-- · 2019-03-31 20:28

I agree completely with akira (+1). Just one more comment on data locality. Once the table gets larger, or if the satellite data is large enough, there's most certainly cache pollution which slows down any operation on the table additionally, or in other words you can rely on the level-1/2/3 cache chain to serve the key data promptly whilst putting up with a cache miss when you have to access the satellite data (e.g. for serialisation).

查看更多
对你真心纯属浪费
5楼-- · 2019-03-31 20:31

The number of times you would serialize such data structure (and sending over network is serializing as well) vs the number of times you would use such data structure (in your program) is pretty low. So, most implementations focus more on the speed instead of the "maybe easier to serialize" side.

If all the data would be in one allocated memory block a lot of operations on that data structure would be a bit expensive because you would have to:

  • reallocate memory on add-operations
  • most likeley compress / vacuum on delete-operations (so that the one block you like so much is dense and has no holes)

Most network operations are buffered anyway, just iterate over the keys and send keys + values.

查看更多
登录 后发表回答