We are working with ASE SYBASE 15.7
We have a huge historical table: 700 millions of records and one CLUSTER INDEX: c1,c2,c3,c4 with ALLOW_DUP_ROWS
Every month we load 700,000 records en the table via BCP IN
These records had a lot of duplicate records because fields: c2 and c3 had the same value for all records
The time of BCP IN was 6 hours
We changed the logical of INSERT's in the monthly table and now all of fields of CLUSTER INDEX: c1,c2,c3,c4 are diferents.
After this change, the BCP IN process takes 2 hours!
Why the time now is lower? We are not clear about the reasons
Thanks a lot!
Rod
I'm guessing your table's locking scheme is allpages
.
If a clustered index on a allpages
table allows dup keys/rows, the dup key/row entries are maintained in what's known as overflow pages (while from an older manual, this info on overflow pages is still accurate).
Each time a dup key/row is inserted into the table, the entire chain of overflow pages is scanned from beginning to end, and then the new entry is added on the end of the chain.
As you can imagine, as this chain grows longer and longer, it takes more and more time for each successive insert to occur.
If you have a history of your data loads I'm guessing you'll find that over time, as the volume of (dup) rows grew, so did the load times ... and this would've been attributable to the greater amount of time required to scan ever increasing overflow page chains.
By making the new data 'unique' you've likely eliminated most of the overhead for scanning overflow page chains. I say 'likely' because it's not clear from your post if some of the new data could contain key values that already exist in the table (ie, the keys may be unique within a given batch of data to be inserted, but the keys aren't necessarily unique within the table).
I usually recommend that clients do NOT place non-unique clustered indexes on allpages
locked tables solely because of the overhead for overflow page chain processing.
Chances are that if you eliminated the non-unique clustered index then you may see your data loads run even quicker, eg:
- convert table to datarows locking; the comparable CLUSTERED w/ dup rows does not use costly overflow page chains (though new data will no longer be maintained in physical order by index key)
- drop/replace the clustered index with a non-clustered index; non-clustered indexes do not use costly overflow page chains