Create a unique primary key (hash) from database c

2019-07-11 21:14发布

I have this table which doesn't have a primary key.

I'm going to insert some records in a new table to analyze them and I'm thinking in creating a new primary key with the values from all the available columns.

If this were a programming language like Java I would:

 int hash = column1 * 31 + column2 * 31 + column3*31 

Or something like that. But this is SQL.

How can I create a primary key from the values of the available columns? It won't work for me to simply mark all the columns as PK, for what I need to do is to compare them with data from other DB table.

My table has 3 numbers and a date.

EDIT What my problem is

I think a bit more of background is needed. I'm sorry for not providing it before.

I have a database ( dm ) that is being updated everyday from another db ( original source ) . It has records form the past two years.

Last month ( july ) the update process got broken and for a month there was no data being updated into the dm.

I manually create a table with the same structure in my Oracle XE, and I copy the records from the original source into my db ( myxe ) I copied only records from July to create a report needed by the end of the month.

Finally on aug 8 the update process got fixed and the records which have been waiting to be migrated by this automatic process got copied into the database ( from originalsource to dm ).

This process does clean up from the original source the data once it is copied ( into dm ).

Everything look fine, but we have just realize that an amount of the records got lost ( about 25% of july )

So, what I want to do is to use my backup ( myxe ) and insert into the database ( dm ) all those records missing.

The problem here are:

  • They don't have a well defined PK.
  • They are in separate databases.

So I thought that If I could create a unique pk from both tables which gave the same number I could tell which were missing and insert them.

EDIT 2

So I did the following in my local environment:

select a.* from the_table@PRODUCTION a , the_table b where
a.idle = b.idle and 
a.activity = b.activity and 
a.finishdate = b.finishdate

Which returns all the rows that are present in both databases ( the .. union? ) I've got 2,000 records.

What I'm going to do next, is delete them all from the target db and then just insert them all s from my db into the target table

I hope I don't get in something worst : - S : -S

4条回答
The star\"
2楼-- · 2019-07-11 21:14

The danger of creating a hash value by combining the 3 numbers and the date is that it might not be unique and hence cannot be used safely as a primary key.

Instead I'd recommend using an autoincrementing ID for your primary key.

查看更多
何必那么认真
3楼-- · 2019-07-11 21:23

Assuming that you have ensured uniqueness...you can do almost the same thing in SQL. The only problem will be the conversion of the date to a numeric value so that you can hash it.

Select Table2.SomeFields 
    FROM Table1 LEFT OUTER JOIN Table2 ON
        (Table1.col1 * 31) + (Table1.col2 * 31) + (Table1.col3 * 31) + 
            ((DatePart(year,Table1.date) + DatePart(month,Table1.date) + DatePart(day,Table1.date) )* 31) = Table2.hashedPk

The above query would work for SQL Server, the only difference for Oracle would be in terms of how you handle the date conversion. Moreover, there are other functions for converting dates in SQL Server as well, so this is by no means the only solution.

And, you can combine this with Quassnoi's SET statement to populate the new field as well. Just use the left side of the Join condition logic for the value.

查看更多
\"骚年 ilove
4楼-- · 2019-07-11 21:39

Just create a surrogate key:

ALTER TABLE mytable ADD pk_col INT

UPDATE  mytable
SET     pk_col = rownum

ALTER TABLE mytable MODIFY pk_col INT NOT NULL

ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)

or this:

ALTER TABLE mytable ADD pk_col RAW(16)

UPDATE  mytable
SET     pk_col = SYS_GUID()

ALTER TABLE mytable MODIFY pk_col RAW(16) NOT NULL

ALTER TABLE mytable ADD CONSTRAINT pk_mytable_pk_col PRIMARY KEY (pk_col)

The latter uses GUID's which are unique across databases, but consume more spaces and are much slower to generate (your INSERT's will be slow)

Update:

If you need to create same PRIMARY KEYs on two tables with identical data, use this:

MERGE
INTO    mytable v
USING   (
        SELECT  rowid AS rid, rownum AS rn
        FROM    mytable
        ORDER BY
                co1l, col2, col3
        )
ON      (v.rowid = rid)
WHEN MATCHED THEN
UPDATE
SET     pk_col = rn

Note that tables should be identical up to a single row (i. e. have same number of rows with same data in them).

Update 2:

For your very problem, you don't need a PK at all.

If you just want to select the records missing in dm, use this one (on dm side)

SELECT  *
FROM    mytable@myxe
MINUS
SELECT  *
FROM    mytable

This will return all records that exist in mytable@myxe but not in mytable@dm

Note that it will shrink all duplicates if any.

查看更多
够拽才男人
5楼-- · 2019-07-11 21:39

If you're loading your new table with values from the old table, and you then need to join the two tables, you can only "properly" do this if you can uniquely identify each row in the original table. Quassnoi's solution will allow you to do this, IF you can first alter the old table by adding a new column.

If you cannot alter the original table, generating some form of hash code based on the columns of the old table would work -- but, again, only if the hash codes uniquely identify each row. (Oracle has checksum functions, right? If so, use them.)

If hash code uniqueness cannot be guaranteed, you may have to settle for a primary key composed of as many columns are required to ensure uniqueness (e.g. the natural key). If there is no natural key, well, I heard once that Oracle provides a rownum for each row of data, could you use that?

查看更多
登录 后发表回答