public final class CzechAnalyzer extends Analyzer
Analyzer
for Czech language.
Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
NOTE: This class uses the same Version
dependent settings as StandardAnalyzer
.
Modifier and Type | Field and Description |
---|---|
static java.lang.String[] |
CZECH_STOP_WORDS
List of typical stopwords.
|
overridesTokenStreamMethod
Constructor and Description |
---|
CzechAnalyzer()
Deprecated.
Use
CzechAnalyzer(Version) instead |
CzechAnalyzer(java.io.File stopwords)
Deprecated.
Use
CzechAnalyzer(Version, File) instead |
CzechAnalyzer(java.util.HashSet stopwords)
Deprecated.
Use
CzechAnalyzer(Version, HashSet) instead |
CzechAnalyzer(java.lang.String[] stopwords)
Deprecated.
Use
CzechAnalyzer(Version, String[]) instead |
CzechAnalyzer(Version matchVersion)
Builds an analyzer with the default stop words (
CZECH_STOP_WORDS ). |
CzechAnalyzer(Version matchVersion,
java.io.File stopwords)
Builds an analyzer with the given stop words.
|
CzechAnalyzer(Version matchVersion,
java.util.HashSet stopwords) |
CzechAnalyzer(Version matchVersion,
java.lang.String[] stopwords)
Builds an analyzer with the given stop words.
|
Modifier and Type | Method and Description |
---|---|
void |
loadStopWords(java.io.InputStream wordfile,
java.lang.String encoding)
Loads stopwords hash from resource stream (file, database...).
|
TokenStream |
reusableTokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a (possibly reused)
TokenStream which tokenizes all the text in
the provided Reader . |
TokenStream |
tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Creates a
TokenStream which tokenizes all the text in the provided Reader . |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
public static final java.lang.String[] CZECH_STOP_WORDS
public CzechAnalyzer()
CzechAnalyzer(Version)
insteadCZECH_STOP_WORDS
).public CzechAnalyzer(Version matchVersion)
CZECH_STOP_WORDS
).public CzechAnalyzer(java.lang.String[] stopwords)
CzechAnalyzer(Version, String[])
insteadpublic CzechAnalyzer(Version matchVersion, java.lang.String[] stopwords)
public CzechAnalyzer(java.util.HashSet stopwords)
CzechAnalyzer(Version, HashSet)
insteadpublic CzechAnalyzer(Version matchVersion, java.util.HashSet stopwords)
public CzechAnalyzer(java.io.File stopwords) throws java.io.IOException
CzechAnalyzer(Version, File)
insteadjava.io.IOException
public CzechAnalyzer(Version matchVersion, java.io.File stopwords) throws java.io.IOException
java.io.IOException
public void loadStopWords(java.io.InputStream wordfile, java.lang.String encoding)
wordfile
- File containing the wordlistencoding
- Encoding used (win-1250, iso-8859-2, ...), null for default system encodingpublic final TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
TokenStream
which tokenizes all the text in the provided Reader
.tokenStream
in class Analyzer
TokenStream
built from a StandardTokenizer
filtered with
StandardFilter
, LowerCaseFilter
, and StopFilter
public TokenStream reusableTokenStream(java.lang.String fieldName, java.io.Reader reader) throws java.io.IOException
TokenStream
which tokenizes all the text in
the provided Reader
.reusableTokenStream
in class Analyzer
TokenStream
built from a StandardTokenizer
filtered with
StandardFilter
, LowerCaseFilter
, and StopFilter
java.io.IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.