Having trouble rereading a Lucene TokenStream

2019-05-07 11:12发布

I am using Lucene 4.6, and am apparently unclear on how to reuse a TokenStream, because I get the exception:

java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.

at the start of the second pass. I've read the Javadoc, but I'm still missing something. Here is a simple example that throws the above exception:

@Test
public void list() throws Exception {
  String text = "here are some words";
  TokenStream ts = new StandardTokenizer(Version.LUCENE_46, new StringReader(text));
  listTokens(ts);
  listTokens(ts);
}

public static void listTokens(TokenStream ts) throws Exception {
  CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
  try {
    ts.reset();
    while (ts.incrementToken()) {
      System.out.println("token text: " + termAtt.toString());
    }
    ts.end();
  }
  finally {
    ts.close();
  }
}

I've tried not calling TokenStream.end() or TokenStream.close() thinking maybe they should only be called at the very end, but I get the same exception.

Can anyone offer a suggestion?

标签: lucene
1条回答
Summer. ? 凉城
2楼-- · 2019-05-07 11:50

The Exception lists, as a possible issue, calling reset() multiple times, which you are doing. This is explicitly not allowed in the implementation of Tokenizer. Since the the java.io.Reader api does not guarantee support of the reset() operation by all subclasses, the Tokenizer can't assume that the Reader passed in can be reset, after all.

You may simply construct a new TokenStream, or I believe you could call Tokenizer.setReader(Reader) (in which case you certainly must close() it first).

查看更多
登录 后发表回答