I've come across some code in the application I'm working on that makes a database call merely to call the ORA_HASH
function (documentation) on a UUID string. The reason it's doing this is that it needs the value to make a service call to another system that appears to use ORA_HASH
for partitioning.
I would like to know the algorithm ORA_HASH
uses so that I can re-implement it to make a similar service call for an application that won't have access to a real database, let alone Oracle. I've only been able to find what amounts to Oracle API documentation so far.
Just to be super clear: I need to clone ORA_HASH
because that's what another system that's outside of my control uses, and I need to integrate with that system. Yes, it would nice if could use a really standard algorithm, like MD5, but I can't, unless that's what ORA_HASH
is under the covers.
Answers or comments that propose the use of a hash algorithm besides ORA_HASH
are not helpful. This question is specifically about ORA_HASH
, not hashing or partitioning in general.
another system that appears to use ORA_HASH
Well, if it "appears to use" then it makes sense to do a bit of reverse engineering and check what exactly is called and disassemble code of the function.
If you, however, want to dive into Oracle internals then following may help.
First of all, you have to figure out what internal C function is called.
To do that you can execute some long running code in one session.
I did run this
select avg(ora_hash(rownum)) id from
(select rownum from dual connect by rownum <= 1e4),
(select rownum from dual connect by rownum <= 1e4);
It can be PL/SQL code as well, you just need to make sure that you constantly call ora_hash.
While it's running
If you on Windows then you can use ostackprof by TANEL PODER
(https://blog.tanelpoder.com/2008/10/31/advanced-oracle-troubleshooting-guide-part-9-process-stack-profiling-from-sqlplus-using-ostackprof/)
If you on *nix then you can use dtrace
(http://www.oracle.com/technetwork/articles/servers-storage-dev/dtrace-on-linux-1956556.html), Flame Graph (usage scenario https://blog.dbi-services.com/oracle-database-multilingual-engine-mle/)
I tested on Windows and looks like that ora_hash is ...->evaopn2()->evahash()->...
Now let's google for evahash. We got extremely lucky because there is a header file on official site https://oss.oracle.com/projects/ocfs-tools/src/branches/new-dir-format/libocfs/Linux/inc/ocfshash.h with link to evahash.
And finally there is page with actual C code http://burtleburtle.net/bob/hash/evahash.html
So far so good, we remember that we can use external C function in Oracle if we build it into library (DLL on Windows).
For example on my Win x64 if I change function signature to
extern "C" ub4 hash( ub1 *k, ub4 length, ub4 initval)
it can be successfully executed from Oracle.
But, as you see, signature a bit differs from ora_hash in Oracle. This function accepts value, its length and initval (may be seed) while signature in Oracle is ora_hash(expr, max_bucket, seed_value).
Let's try to test
Oracle
SQL> select ora_hash(utl_raw.cast_to_raw('0'), power(2, 32) - 1, 0) oh1,
2 ora_hash('0', power(2, 32) - 1, 0) oh2,
3 ora_hash(0, power(2, 32) - 1, 0) oh3,
4 ora_hash(chr(0), power(2, 32) - 1, 0) oh4
5 from dual;
OH1 OH2 OH3 OH4
---------- ---------- ---------- ----------
3517341953 3517341953 1475158189 4056412421
C
int main()
{
ub1 ta[] = {0};
ub1* t = ta;
cout << hash(t, 1, 0) << endl;
ub1 ta0[] = {'0'};
ub1* t0 = ta0;
cout << hash(t0, 1, 0) << endl;
return 0;
}
1843378377
4052366646
None of the numbers matches.
So what is the problem?
ora_hash accepts parameters of almost any type (for example select ora_hash(sys.odcinumberlist(1,2,3)) from dual
) while C function accepts value as array of bytes. This means that some conversion happens before function call.
Thus before using mentioned C hash function you have to figure out how actual value is transformed before passing to it.
You can proceed with reverse engineering of Oracle binaries using IDA PRO + hex rays but that may take days. Not to mention platform specific details.
So if you want to imitate ora_hash, the easiest option would be to install Oracle express edition and use it to call ora_hash.
I hope that was interesting. Good luck.
Update
ora_hash and dbms_utility.get_hash_value can be mapped to each other (see https://jonathanlewis.wordpress.com/2009/11/21/ora_hash-function/)
SQL> select dbms_utility.get_hash_value('0', 0 + 1, 1e6 + 1) ha1,
2 ora_hash('0', 1e6, 0) + 1 ha2
3 from dual;
HA1 HA2
---------- ----------
338437 338437
If we unwrap package body of dbms_utility we will see following declaration
function get_hash_value(name varchar2, base number, hash_size number)
return number is
begin
return(icd_hash(name, base, hash_size));
end;
and
function icd_hash(name varchar2,
base binary_integer,
hash_size binary_integer) return binary_integer;
pragma interface(c, icd_hash);
Let's google for icd_hash
and we can find that it's mapped to _psdhsh
(https://yurichev.com/blog/50/). Now it's time to disassemble oracle.exe and extract code for _psdhsh
from it. Maybe I'll spend some time on this next year.
This doesn't answer the OP question of the actual algo behind ora_hash. This is just an example of using ora_hash in pl/sql (answering @JonHeller comment):
The function:
SQL> create or replace function get_ora_hash(i_str in varchar2, i_max_bucket in number default 4294967295, i_seed number default 0)
return number deterministic
parallel_enable
as
rv number:= 0;
begin
select ORA_HASH(i_str, i_max_bucket, i_seed)
into rv
from dual;
return rv;
end;
Function created.
And using it:
SQL> declare
l_val number;
begin
l_val := get_ora_hash('test');
dbms_output.put_line(l_val);
end;
PL/SQL procedure successfully completed.
Dbms Output:
2662839991
You can also mess around with RESULT_CACHE or other techniques to try to speed things up even more.
Its very fast already. For example, calling the function 1 million times on a large table:
SQL> set serveroutput on
SQL> declare
l_val number;
l_start_dte timestamp;
l_end_dte timestamp;
l_interval INTERVAL DAY(9) TO SECOND(9);
l_cnt number := 0;
begin
l_start_dte:= systimestamp;
--for rec in (select object_name from dba_objects)
for rec in (select name from my_big_table where rownum <= 1000000)
loop
l_cnt := l_cnt + 1;
l_val := get_ora_hash(rec.name);
end loop;
l_end_dte:= systimestamp;
l_interval := l_end_dte - l_start_dte;
dbms_output.put_line('Rows processed: ' || l_cnt
|| ', Start: ' || l_start_dte
|| ', End: ' || l_end_dte
|| ', Interval: ' || l_interval);
end;
Rows processed: 1000000, Start: 14-DEC-17 02.48.31.138212 PM, End: 14-DEC-17 02.48.41.148884 PM, Interval: +000000000 00:00:10.010672000
PL/SQL procedure successfully completed.
So basically 100k rows per second, that includes any context switches you may be worried about.
If you need to reproduce ORA_HASH because of performance, I would suggest that your performance bottleneck may be elsewhere.