public final class CJKTokenizer
extends org.apache.lucene.analysis.Tokenizer
The tokens returned are every two adjacent characters with overlap match.
Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
Additionally, the following is applied to Latin text (such as English):Constructor and Description |
---|
CJKTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
java.io.Reader in) |
CJKTokenizer(org.apache.lucene.util.AttributeSource source,
java.io.Reader in) |
CJKTokenizer(java.io.Reader in)
Construct a token stream processing the given input.
|
Modifier and Type | Method and Description |
---|---|
void |
end() |
boolean |
incrementToken()
Returns true for the next token in the stream, or false at EOS.
|
void |
reset() |
void |
reset(java.io.Reader reader) |
getOnlyUseNewAPI, next, next, setOnlyUseNewAPI
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
public CJKTokenizer(java.io.Reader in)
in
- I/O readerpublic CJKTokenizer(org.apache.lucene.util.AttributeSource source, java.io.Reader in)
public CJKTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, java.io.Reader in)
public boolean incrementToken() throws java.io.IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
- - throw IOException when read error public final void end()
end
in class org.apache.lucene.analysis.TokenStream
public void reset() throws java.io.IOException
reset
in class org.apache.lucene.analysis.TokenStream
java.io.IOException
public void reset(java.io.Reader reader) throws java.io.IOException
reset
in class org.apache.lucene.analysis.Tokenizer
java.io.IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.