What's the Hi/Lo algorithm?

2018-12-31 08:38发布

What's the Hi/Lo algorithm?

I've found this in the NHibernate documentation (it's one method to generate unique keys, section 5.1.4.2), but I haven't found a good explanation of how it works.

I know that Nhibernate handles it, and I don't need to know the inside, but I'm just curious.

5条回答
永恒的永恒
2楼-- · 2018-12-31 08:39

The hi/lo algorithms splits the sequences domain into “hi” groups. A “hi” value is assigned synchronously. Every “hi” group is given a maximum number of “lo” entries, that can by assigned off-line without worrying about concurrent duplicate entries.

  1. The “hi” token is assigned by the database, and two concurrent calls are guaranteed to see unique consecutive values
  2. Once a “hi” token is retrieved we only need the “incrementSize” (the number of “lo” entries)
  3. The identifiers range is given by the following formula:

    [(hi -1) * incrementSize) + 1, (hi * incrementSize) + 1)
    

    and the “lo” value will be in the range:

    [0, incrementSize)
    

    being applied from the start value of:

    [(hi -1) * incrementSize) + 1)
    
  4. When all “lo” values are used, a new “hi” value is fetched and the cycle continues

You can find a more detailed explanation in this article:

And this visual presentation is easy to follow as well:

enter image description here

While hi/lo optimizer is fine for optimizing identifier generation, it doesn't play well with other systems inserting rows into our database, without knowing anything about our identifier strategy.

Hibernate offers the pooled-lo optimizer, which combines a hi/lo generator strategy with an interoperability sequence allocation mechanism. This optimizer is both efficient and interoperable with other systems, being a better candidate than the previous legacy hi/lo identifier strategy.

查看更多
谁念西风独自凉
3楼-- · 2018-12-31 08:40

The basic idea is that you have two numbers to make up a primary key- a "high" number and a "low" number. A client can basically increment the "high" sequence, knowing that it can then safely generate keys from the entire range of the previous "high" value with the variety of "low" values.

For instance, supposing you have a "high" sequence with a current value of 35, and the "low" number is in the range 0-1023. Then the client can increment the sequence to 36 (for other clients to be able to generate keys while it's using 35) and know that keys 35/0, 35/1, 35/2, 35/3... 35/1023 are all available.

It can be very useful (particularly with ORMs) to be able to set the primary keys on the client side, instead of inserting values without primary keys and then fetching them back onto the client. Aside from anything else, it means you can easily make parent/child relationships and have the keys all in place before you do any inserts, which makes batching them simpler.

查看更多
柔情千种
4楼-- · 2018-12-31 08:42

I found the Hi/Lo algorithm is perfect for multiple databases with replication scenarios based in my experience. Imagine this. you have a server in New York (alias 01) and another server in Los Angeles (alias 02) then you have a PERSON table... so in New York when a person is create... you always use 01 as the HI value and the LO value is the next secuential. por example.

  • 010000010 Jason
  • 010000011 David
  • 010000012 Theo

in Los Angeles you always use the HI 02. for example:

  • 020000045 Rupert
  • 020000046 Oswald
  • 020000047 Mario

So, when you use the database replication (no matter what brand) all the primary keys and data combine easily and naturally without to worry about duplicate primary keys, collissions, etc.

This is the best way to go in this scenario.

查看更多
倾城一夜雪
5楼-- · 2018-12-31 08:45

Better than the Hi-Lo allocator, is the "Linear Chunk" allocator. This uses a similar table-based principle but allocates small, conveniently-sized chunks & generates nice human-friendly values.

create table KEY_ALLOC (
    SEQ varchar(32) not null,
    NEXT bigint not null,
    primary key (SEQ)
);

To allocate the next, say, 20 keys (which are then held as a range in the server & used as needed):

select NEXT from KEY_ALLOC where SEQ=?;
update KEY_ALLOC set NEXT=(old value+20) where SEQ=? and NEXT=(old value);

Providing you can commit this transaction (use retries to handle contention), you have allocated 20 keys & can dispense them as needed.

With a chunk-size of just 20, this scheme is 10x faster than allocating from an Oracle sequence, and is 100% portable amongst all databases. Allocation performance is equivalent to hi-lo.

Unlike Ambler's idea, it treats the keyspace as a contiguous linear numberline.

This avoids the impetus for composite keys (which were never really a good idea) and avoids wasting entire lo-words when the server restarts. It generates "friendly", human-scale key values.

Mr Ambler's idea, by comparison, allocates the high 16- or 32-bits, and generates large human-unfriendly key values as the hi-words increment.

Comparison of allocated keys:

Linear_Chunk       Hi_Lo
100                65536
101                65537
102                65538
.. server restart
120                131072
121                131073
122                131073
.. server restart
140                196608

Design-wise, his solution is fundamentally more complex on the number-line (composite keys, large hi_word products) than Linear_Chunk while achieving no comparative benefit. His design is thus mathematically proven deficient.

查看更多
无与为乐者.
6楼-- · 2018-12-31 09:00

In addition to Jon's answer:

It is used to be able to work disconnected. A client can then ask the server for a hi number and create objects increasing the lo number itself. It does not need to contact the server until the lo range is used up.

查看更多
登录 后发表回答