public class ArabicLetterTokenizer
extends org.apache.lucene.analysis.LetterTokenizer
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
Constructor and Description |
---|
ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
java.io.Reader in) |
ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource source,
java.io.Reader in) |
ArabicLetterTokenizer(java.io.Reader in) |
Modifier and Type | Method and Description |
---|---|
protected boolean |
isTokenChar(char c)
Allows for Letter category or NonspacingMark category
|
end, incrementToken, next, next, normalize, reset
getOnlyUseNewAPI, reset, setOnlyUseNewAPI
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
public ArabicLetterTokenizer(java.io.Reader in)
public ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource source, java.io.Reader in)
public ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, java.io.Reader in)
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.