Class CharacterTokenization
- All Implemented Interfaces:
Serializable
,Annotatable
,SymbolTokenization
,Changeable
Implementation of SymbolTokenization which binds symbols to single unicode characters.
Many alphabets (and all simple built-in alphabets like DNA, RNA
and Protein) will have an instance of CharacterTokenization
registered under the name 'token', so that you could say
CharacterTokenization ct = (CharacterTokenization)
alpha.getTokenization('token');
and expect it to work. When
you construct a new instance of this class for an alphabet, there
will be no initial associations of Symbols with characters. It is
your responsibility to populate the new tokenization appropriately.
- Since:
- 1.2
- Author:
- Thomas Down, Matthew Pocock, Greg Cox, Keith James
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable
Annotatable.AnnotationForwarder
Nested classes/interfaces inherited from interface org.biojava.bio.seq.io.SymbolTokenization
SymbolTokenization.TokenType
-
Field Summary
Fields inherited from interface org.biojava.bio.Annotatable
ANNOTATION
Fields inherited from interface org.biojava.bio.seq.io.SymbolTokenization
CHARACTER, FIXEDWIDTH, SEPARATED, UNKNOWN
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
bindSymbol
(Symbol s, char c) Bind a Symbol to a character.The alphabet to which this tokenization applies.Should return the associated annotation object.protected Symbol[]
Determine the style of tokenization represented by this object.parseStream
(SeqIOListener listener) Return an object which can parse an arbitrary character stream into symbols.parseToken
(String token) Returns the symbol for a single token.protected Symbol
parseTokenChar
(char c) Return a token representing a single symbol.Return a string representation of a list of symbols.Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
-
Constructor Details
-
CharacterTokenization
-
-
Method Details
-
getAlphabet
Description copied from interface:SymbolTokenization
The alphabet to which this tokenization applies.- Specified by:
getAlphabet
in interfaceSymbolTokenization
-
getTokenType
Description copied from interface:SymbolTokenization
Determine the style of tokenization represented by this object.- Specified by:
getTokenType
in interfaceSymbolTokenization
-
getAnnotation
Description copied from interface:Annotatable
Should return the associated annotation object.- Specified by:
getAnnotation
in interfaceAnnotatable
- Returns:
- an Annotation object, never null
-
bindSymbol
Bind a Symbol to a character.
This method will ensure that when this char is observed, it resolves to this symbol. If it was previously associated with another symbol, the old binding is removed. If this is the first time the symbol has been bound to any character, then this character is taken to be the default tokenization of the Symbol. This means that when converting symbols into characters, this char will be used. If the symbol has previously been bound to another character, then this char will not be produced for the symbol when stringifying the symbol, but this symbol will be produced when tokenizing this character.
- Parameters:
s
- the Symbol to bindc
- the char to bind it to
-
parseToken
Description copied from interface:SymbolTokenization
Returns the symbol for a single token.The Symbol will be a member of the alphabet. If the token is not recognized as mapping to a symbol, an exception will be thrown.
- Specified by:
parseToken
in interfaceSymbolTokenization
- Parameters:
token
- the token to retrieve a Symbol for- Returns:
- the Symbol for that token
- Throws:
IllegalSymbolException
- if there is no Symbol for the token
-
getTokenTable
-
parseTokenChar
- Throws:
IllegalSymbolException
-
tokenizeSymbol
Description copied from interface:SymbolTokenization
Return a token representing a single symbol.- Specified by:
tokenizeSymbol
in interfaceSymbolTokenization
- Parameters:
s
- The symbol- Throws:
IllegalSymbolException
- if the symbol isn't recognized.
-
tokenizeSymbolList
Description copied from interface:SymbolTokenization
Return a string representation of a list of symbols.- Specified by:
tokenizeSymbolList
in interfaceSymbolTokenization
- Parameters:
sl
- A SymbolList- Throws:
IllegalAlphabetException
- if alphabets don't match
-
parseStream
Description copied from interface:SymbolTokenization
Return an object which can parse an arbitrary character stream into symbols.- Specified by:
parseStream
in interfaceSymbolTokenization
- Parameters:
listener
- The listener which gets notified of parsed symbols.
-