Class SoftMaskedAlphabet
- All Implemented Interfaces:
Annotatable
,Alphabet
,FiniteAlphabet
,Changeable
>DNA_sequence ATGGACGCTAGCATggtggtggtggtggtggtggtGCATAGCGAGCAAGTGGAGCGTWhere the lowercase regions are masked by low complexity.
SoftMaskedAlphabet
s come with SymbolTokenizers
that understand how to read and write the softmasking. The interpretation
of what constitutes a masked region is governed by an implementation of
a MaskingDetector
. The DEFAULT
field of the
MaskingDetector
interface defines lower case tokens as masked.
Copyright (c) 2004 Novartis Institute for Tropical Diseases
- Version:
- 1.0
- Author:
- Mark Schreiber
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclass
ThisSymbolTokenizer
works with a delegate to softmask symbol tokenization as appropriate.static interface
Implementations will define how soft masking looks.Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable
Annotatable.AnnotationForwarder
-
Field Summary
Fields inherited from interface org.biojava.bio.symbol.Alphabet
EMPTY_ALPHABET, PARSERS, SYMBOLS
Fields inherited from interface org.biojava.bio.Annotatable
ANNOTATION
-
Method Summary
Modifier and TypeMethodDescriptionvoid
SoftMaskedAlphabet
s cannot add newSymbol
s.boolean
Returns whether or not this Alphabet contains the symbol.Gets the components of theAlphabet
.getAmbiguity
(Set s) This is not supported.The SoftMaskedAlphabet has no annotationprotected FiniteAlphabet
The compound alpha that holds the symbols used by this wrapperGet the 'gap' ambiguity symbol that is most appropriate for this alphabet.static SoftMaskedAlphabet
getInstance
(FiniteAlphabet alphaToMask) Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.static SoftMaskedAlphabet
getInstance
(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector) Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if anySymbol
is soft masked or not.Gets theAlphabet
upon which masking is being appliedGetter for theMaskingDetector
getName()
The name of the AlphabetGets the compound symbol composed of theSymbols
in the List.getTokenization
(String type) Get a SymbolTokenization by name.boolean
Determines if aSymbol
is masked.iterator()
Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.void
SoftMaskedAlphabet
s cannot removeSymbol
s.int
size()
The number of symbols in the alphabet.void
Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
-
Method Details
-
getInstance
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask) throws IllegalAlphabetException Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked.- Parameters:
alphaToMask
- for example the DNA alphabet.- Returns:
- a reference to a singleton
SoftMaskedAlphabet
. - Throws:
IllegalAlphabetException
- if it cannot be constructed
-
getInstance
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector) throws IllegalAlphabetException Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if anySymbol
is soft masked or not.- Parameters:
alphaToMask
- for example the DNA alphabet.maskingDetector
- to define masking behaivour- Returns:
- a reference to a singleton
SoftMaskedAlphabet
. - Throws:
IllegalAlphabetException
- if it cannot be constructed
-
getMaskedAlphabet
Gets theAlphabet
upon which masking is being applied- Returns:
- A
FiniteAlphabet
-
getDelegate
The compound alpha that holds the symbols used by this wrapper- Returns:
- a
FiniteAlphabet
-
getAnnotation
The SoftMaskedAlphabet has no annotation- Specified by:
getAnnotation
in interfaceAnnotatable
- Returns:
- Annotation.EMPTY_ANNOTATION
-
getName
The name of the Alphabet -
getAlphabets
Gets the components of theAlphabet
.- Specified by:
getAlphabets
in interfaceAlphabet
- Returns:
- a
List
with two members, the first is the wrappedAlphabet
the second is the binarySubIntegerAlphabet
.
-
getSymbol
Gets the compound symbol composed of theSymbols
in the List. TheSymbols
in theList
must be fromalpha
(defined in the constructor) andSUBINTEGER[0..1]
- Specified by:
getSymbol
in interfaceAlphabet
- Parameters:
l
- aList
ofSymbols
- Returns:
- A
Symbol
from this alphabet. - Throws:
IllegalSymbolException
- ifl
is not as expected (see above)
-
getAmbiguity
This is not supported. Ambiguity should be handled at the level of the wrapped Alphabet. UsegetSymbol(List l)
instead and provide it with an ambigutiy and a masking symbol.- Specified by:
getAmbiguity
in interfaceAlphabet
- Parameters:
s
- aSet
ofSymbols
- Returns:
- a Symbol (possibly fly-weighted) for the Set of symbols in syms
- Throws:
UnsupportedOperationException
- See Also:
-
getGapSymbol
Description copied from interface:Alphabet
Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.
In general, this will be a BasisSymbol that represents a list of AlphabetManager.getGapSymbol() the same length as the getAlphabets list.
- Specified by:
getGapSymbol
in interfaceAlphabet
- Returns:
- the appropriate gap Symbol instance
-
contains
Description copied from interface:Alphabet
Returns whether or not this Alphabet contains the symbol.
An alphabet contains an ambiguity symbol iff the ambiguity symbol's getMatches() returns an alphabet that is a proper sub-set of this alphabet. That means that every one of the symbols that could match the ambiguity symbol is also a member of this alphabet.
-
validate
Description copied from interface:Alphabet
Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.
This function is used all over the code to validate symbols as they enter a method. Also, the code is littered with catches for IllegalSymbolException. There is a preferred style of handling this, which should be covererd in the package documentation.
- Specified by:
validate
in interfaceAlphabet
- Parameters:
s
- the Symbol to validate- Throws:
IllegalSymbolException
- if r is not contained in this alphabet
-
getMaskingDetector
Getter for theMaskingDetector
- Returns:
- the
MaskingDetector
-
getTokenization
Description copied from interface:Alphabet
Get a SymbolTokenization by name.
The parser returned is guaranteed to return Symbols and SymbolLists that conform to this alphabet.
Every alphabet should have a SymbolTokenzation under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolTokenization under the name 'name' that uses symbol names to identify symbols. Any other names may also be defined, but the behavior of the returned SymbolTokenization is not defined here.
A SymbolTokenization under the name 'default' should be defined for all sequences, that determines the behavior when printing out a sequence. Standard behavior is to define the 'token' SymbolTokenization as default if it exists, else to define the 'name' SymbolTokenization as the default, but others are possible.
- Specified by:
getTokenization
in interfaceAlphabet
- Parameters:
type
- the name of the parser- Returns:
- a parser for that name
- Throws:
BioException
- if for any reason the tokenization could not be built
-
size
Description copied from interface:FiniteAlphabet
The number of symbols in the alphabet.- Specified by:
size
in interfaceFiniteAlphabet
- Returns:
- the size of the alphabet
-
iterator
Description copied from interface:FiniteAlphabet
Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet.Each AtomicSymbol as for which this.contains(as) is true will be returned exactly once by this iterator in no specified order.
- Specified by:
iterator
in interfaceFiniteAlphabet
- Returns:
- an Iterator over the contained AtomicSymbol objects
-
addSymbol
SoftMaskedAlphabet
s cannot add newSymbol
s. AChangeVetoException
will be thrown.- Specified by:
addSymbol
in interfaceFiniteAlphabet
- Parameters:
s
- theSymbol
to add.- Throws:
ChangeVetoException
- when called.
-
removeSymbol
SoftMaskedAlphabet
s cannot removeSymbol
s. AChangeVetoException
will be thrown.- Specified by:
removeSymbol
in interfaceFiniteAlphabet
- Parameters:
s
- theSymbol
to remove.- Throws:
ChangeVetoException
- when called.
-
isMasked
Determines if aSymbol
is masked.- Parameters:
s
- theSymbol
to test.- Returns:
- true if
s
is masked. - Throws:
IllegalSymbolException
-