Oracle Text Contains and technical content

2019-07-25 15:52发布

I'am searching for the technical word "AN-XYZ99". So I use

SELECT *
FROM foo
WHERE CONTAINS(bar, 'AN{-}XYZ99') > 0

but I get also results like "FO-XYZ99" or "BAR-XYZ99". What can I do to ensure the expected result?

I used

BEGIN
    CTX_DDL.CREATE_PREFERENCE('FOO','BASIC_LEXER');
    CTX_DDL.SET_ATTRIBUTE('FOO', 'ALTERNATE_SPELLING', 'GERMAN');
    CTX_DDL.SET_ATTRIBUTE('FOO', 'COMPOSITE', 'GERMAN');
    CTX_DDL.SET_ATTRIBUTE('FOO', 'MIXED_CASE', 'NO');
END;

Sample data from column "bar" (VARCHAR2(4000)):

"unbekannt Stadt Text: AN-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 foo"
"unbekannt Stadt Text: FO-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bar"
"unbekannt Stadt Text: BAR-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bla"

With the Statement above I would like the first row as output but I get the second and third row as well.

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

1条回答
放荡不羁爱自由
2楼-- · 2019-07-25 16:45

First you must define hyphen as a printjoin in your lexer.

check it with

select IXV_ATTRIBUTE, IXV_VALUE from CTXSYS.CTX_INDEX_VALUES where IXV_CLASS =  'LEXER';

IXV_ATTRIBUTE                  IXV_VALUE     
-----------------------------------------
PRINTJOINS                     _$%&-         
NUMJOIN                        .              
NUMGROUP                       .              
WHITESPACE                     ,= 

Then you may (after re-creating index with this lexer) validate that the tokens are as expected: (your table would vary based on the index name; check all tables like 'DR$%$I')

select TOKEN_TEXT from DR$TEXTIDX_IDX$I where TOKEN_TEXT like '%-XYZ99';
TOKEN_TEXT                                                     
----------------------------------------------------------------
AN-XYZ99                                                         
BAR-XYZ99                                                        
FO-XYZ99

Now you may query for the search string.

Aparently you must escape the hyphen as BAR-XYZ99 will find rows with BAR not containing XYZ99 ; although the documentation of hyphen with no space is a bit different.

SELECT SCORE(1),txt
FROM textidx
WHERE  CONTAINS(txt, 'BAR-XYZ99',1) > 0; 

  SCORE(1) TXT                                                                                
---------- ------------------------------------------------------------------------------------
         4 unbekannt Stadt Text: FO-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bar

For some reason (I'm on 11.2.0.2.0) the escaping with curly braces doesn't work (returns no match), but using backslash is fine.

SELECT SCORE(1),txt
FROM textidx
WHERE  CONTAINS(txt, 'BAR\-XYZ99',1) > 0;  

  SCORE(1) TXT                                                                                
---------- ------------------------------------------------------------------------------------
         4 unbekannt Stadt Text: BAR-XYZ99 << foobar Straße 31.12.2017 Datum Host 20160101 bla 
查看更多
登录 后发表回答