While indexing my document using lucene Standard Analyzer I got a plroblem.
For example:
my document had a word "plag-iarism" ... here this analyzer indexed it as "plag" and "iarism". But I want like "plagiarism". What I have to do to get a whole word?
StandardAnalyzer delegates tokanization to StandardTokenizer.
You create your own tokanizer to match your exact needs (you could base it on StandardTokenizer).
Alternatively, if you prefer, you could do a dirty hack of a String.replace(), with the relevant regular expression, just the analyzer runs. Yeah. Ugly.