How to sync and optimize an Oracle Text index?

2020-05-24 05:07发布

问题:

We want to use a ctxsys.context index type for full text search. But I was quite surprised, that an index of this type is not automatically updated. We have 3 million documents with about 10k updates/inserts/deletes per day.

What are your recommendations for syncing and optimizing an Oracle Text index?

回答1:

I think 'SYNC EVERY' option, as described in previous answer only available in Oracle 10g or newer. If you're using older version of Oracle you would have to run sync operation periodically. For example, you can create following stored procedure:

CREATE OR REPLACE 
Procedure sync_ctx_indexes
IS
 CURSOR sql1 is select distinct(pnd_index_owner||'.'||pnd_index_name) as index_name from ctx_pending;
BEGIN
 FOR rec1 IN sql1 LOOP
 ctx_ddl.sync_index(rec1.index_name);
 END LOOP;
END;

and then schedule it run via DBMS_JOB:

DBMS_JOB.SUBMIT(job_id, 'sync_ctx_indexes;', SYSDATE, 'SYSDATE + 1/720');

As for index optimization, following command can be used (also can be scheduled with DBMS_JOB or via cron):

alter index my_index rebuild online parameters('optimize full maxtime 60');

There is also CTX_* package with similar function available.



回答2:

What do you mean by "not automatically updated"?

The index can be synchronized on commit or periodically.

Create index ... on ... INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('SYNC ( ON COMMIT)')
Create index ... on ... INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS 'SYNC (EVERY "SYSDATE+1/24")')

I you don't need real-time search accuracy our DBA recommended to sync the index periodically, say each 2 min. If you can afford to do it overnight, then even better. What is best depends on your load and the size of the document.

These links can probably provide you with more information:

  • Oracle TEXT index maintenance
  • Working with Oracle TEXT

For DBA advice, maybe serverfault is better?



回答3:

Putting this here as an update for Oracle 12C users. If you use the index in real time mode, then it keeps items in memory, and periodicially pushes to the main tables, which keeps fragmentation down and enables NRT search on streaming content. Here's how to set it up

exec ctx_ddl.drop_preference ( 'your_tablespace' );
exec ctx_ddl.create_preference( 'your_tablespace', 'BASIC_STORAGE' );
exec ctx_ddl.set_attribute ( 'your_tablespace', 'STAGE_ITAB', 'true' );
create index  some_text_idx on your_table(text_col)  indextype is ctxsys.context PARAMETERS ('storage your_tablespace sync (on commit)')

this will set up the index in NRT mode. It's pretty sweet.