public class ShingleAnalyzerWrapper extends Analyzer
ShingleFilter
around another Analyzer
.
A shingle is another name for a token based n-gram.
Modifier and Type | Field and Description |
---|---|
protected Analyzer |
defaultAnalyzer |
protected int |
maxShingleSize |
protected boolean |
outputUnigrams |
overridesTokenStreamMethod
Constructor and Description |
---|
ShingleAnalyzerWrapper()
Wraps
StandardAnalyzer . |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer) |
ShingleAnalyzerWrapper(Analyzer defaultAnalyzer,
int maxShingleSize) |
ShingleAnalyzerWrapper(int nGramSize) |
Modifier and Type | Method and Description |
---|---|
int |
getMaxShingleSize()
The max shingle (ngram) size
|
boolean |
isOutputUnigrams() |
TokenStream |
reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
void |
setMaxShingleSize(int maxShingleSize)
Set the maximum size of output shingles
|
void |
setOutputUnigrams(boolean outputUnigrams)
Shall the filter pass the original tokens (the "unigrams") to the output
stream?
|
TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
protected Analyzer defaultAnalyzer
protected int maxShingleSize
protected boolean outputUnigrams
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer)
public ShingleAnalyzerWrapper(Analyzer defaultAnalyzer, int maxShingleSize)
public ShingleAnalyzerWrapper()
StandardAnalyzer
.public ShingleAnalyzerWrapper(int nGramSize)
public int getMaxShingleSize()
public void setMaxShingleSize(int maxShingleSize)
maxShingleSize
- max shingle sizepublic boolean isOutputUnigrams()
public void setOutputUnigrams(boolean outputUnigrams)
outputUnigrams
- Whether or not the filter shall pass the original
tokens to the output streampublic TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
Analyzer
tokenStream
in class Analyzer
public TokenStream reusableTokenStream(java.lang.String fieldName, java.io.Reader reader) throws java.io.IOException
Analyzer
reusableTokenStream
in class Analyzer
java.io.IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.