Hash Table: Why deletion is difficult in open addr

I am trying to understand the open addressing method. I refer to T. H. Cormen's book on this topic, which states that deletion is difficult in open addressing. I am completely stuck at this paragraph:

Deletion from an open-address hash table is difficult. When we delete a key from slot i, we cannot simply mark that slot as empty by storing NIL in it. Doing so might make it impossible to retrieve any key k during whose insertion we had probed slot i and found it occupied.

I don't understand this. Please explain it with some examples.

标签： algorithm hashtable hash

3条回答

傲

2楼-- · 2019-03-09 20:33

Assume hash(x) = hash(y) = hash(z) = i. And assume x was inserted first, then y and then z.
In open addressing: table[i] = x, table[i+1] = y, table[i+2] = z.

Now, assume you want to delete x, and set it back to NULL.

When later you will search for z, you will find that hash(z) = i and table[i] = NULL, and you will return a wrong answer: z is not in the table.

To overcome this, you need to set table[i] with a special marker indicating to the search function to keep looking at index i+1, because there might be element there which its hash is also i.

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2019-03-09 20:39

Deletion from a linear probed open addressed hash table is simple. There was pseudo code for it on the Wikipedia Hash Table page for years. I don't know why is isn't there any more, but here is a permalink back to when it was: Old Wikipedia Hash Table page, and here for your convenience is the pseudocode:

function remove(key)
 i := find_slot(key)
 if slot[i] is unoccupied
     return   // key is not in the table
 j := i
 loop
     j := (j+1) modulo num_slots
     if slot[j] is unoccupied
         exit loop
     k := hash(slot[j].key) modulo num_slots
     if (j > i and (k <= i or k > j)) or
        (j < i and (k <= i and k > j)) (note 2)
         slot[i] := slot[j]
         i := j
 mark slot[i] as unoccupied

There is also a ref on that page to some real code. I believe this has exactly the same performance characteristic as insertion.

This method of deletion is better than the much used 'mark deleted and occasionally rehash everything' because the above method is constant time rather than amortized constant time. If you have a hash table of a million items you are adding and deleting from, in the 'mark deleted' method, an occasional add or delete is going to take a million times longer than the ones before and after it - which is not a good performance characteristic.

0人赞添加讨论(0) 举报

爷、活的狠高调

4楼-- · 2019-03-09 20:55

In an open addressing scheme, lookups invoke a series of probes until either the key is found or and empty slot is found.

If one key involves a chain of several probes, it will be lost (not findable) if somewhere along the chain, one of the other keys is removed, leaving an empty slot where a stepping stone was needed.

The usual solution is to delete a key by marking its slot as available-for-resuse-but-not-actually empty. In other words, a replacement stepping stone is added so that probe chains to other keys aren't cut short.

Hope this helps your understanding.

0人赞添加讨论(0) 举报

Hash Table: Why deletion is difficult in open addr

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间