Class AlphabetManager

java.lang.Object
org.biojava.bio.symbol.AlphabetManager

public final class AlphabetManager extends Object
Utility methods for working with Alphabets. Also acts as a registry for well-known alphabets.

The alphabet interfaces themselves don't give you a lot of help in actually getting an alphabet instance. This is where the AlphabetManager comes in handy. It helps out in serialization, generating derived alphabets and building CrossProductAlphabet instances. It also contains limited support for parsing complex alphabet names back into the alphabets.

Author:
Matthew Pocock, Thomas Down, Mark Schreiber, George Waldon (alternate tokenization)
  • Constructor Details

  • Method Details

    • instance

      public static AlphabetManager instance()
      Deprecated.
      all AlphabetManager methods have become static
      Retrieve the singleton instance.
      Returns:
      the AlphabetManager instance
    • getAllAmbiguitySymbol

      Return the ambiguity symbol which matches all symbols in a given alphabet.
      Parameters:
      alpha - The alphabet
      Returns:
      the ambiguity symbol
      Since:
      1.2
    • getAllSymbols

      public static Set getAllSymbols(FiniteAlphabet alpha)
      Return a set containing all possible symbols which can be considered members of a given alphabet, including ambiguous symbols. Warning, this method can return large sets!
      Parameters:
      alpha - The alphabet
      Returns:
      The set of symbols that are members of alpha
      Since:
      1.2
    • alphabetForName

      Retrieve the alphabet for a specific name.
      Parameters:
      name - the name of the alphabet
      Returns:
      the alphabet object
      Throws:
      NoSuchElementException - if there is no alphabet by that name
    • symbolForName

      public static Symbol symbolForName(String name) throws NoSuchElementException
      Deprecated.
      use symbolForLifeScienceID() instead
      Retrieve the symbol represented a String object
      Parameters:
      name - of the string whose symbol you want to get
      Returns:
      The Symbol
      Throws:
      NoSuchElementException - if the string name is invalid.
    • symbolForLifeScienceID

      Retreives the Symbol for the LSID
      Parameters:
      lsid - the URN for the Symbol
      Returns:
      a reference to the Symbol
    • registerAlphabet

      public static void registerAlphabet(String name, Alphabet alphabet)
      Register an alphabet by name.
      Parameters:
      name - the name by which it can be retrieved
      alphabet - the Alphabet to store
    • registerAlphabet

      public static void registerAlphabet(String[] names, Alphabet alphabet)
      Register and Alphabet by more than one name. This allows aliasing of an alphabet with two or more names. It is equivalent to calling registerAlphabet(String name, Alphabet alphabet) several times.
      Parameters:
      names - the names by which it can be retrieved
      alphabet - the Alphabet to store
      Since:
      1.4
    • registrations

      public static Set registrations()
      A set of names under which Alphabets have been registered.
      Returns:
      a Set of Strings
    • registered

      public static boolean registered(String name)
      Has an Alphabet been registered by that name
      Parameters:
      name - the name of the alphabet
      Returns:
      true if it has or false otherwise
    • alphabets

      public static Iterator alphabets()
      Get an iterator over all alphabets known.
      Returns:
      an Iterator over Alphabet objects
    • getGapSymbol

      public static Symbol getGapSymbol()

      Get the special `gap' Symbol.

      The gap symbol is a Symbol that has an empty alphabet of matches. As such , ever alphabet contains gap, as there is no symbol that matches gap, so there is no case where an alphabet doesn't contain a symbol that matches gap.

      Gap can be thought of as an empty sub-space within the space of all possible symbols. If you are working in a cross-product alphabet, you should chose whether to use gap to represent 'no symbol', or a basis symbol of the appropriate size built entirely of gaps to represent 'no symbol in each of the slots'. Perhaps this could be explained better.

      Returns:
      the system-wide symbol that represents a gap
    • getGapSymbol

      public static Symbol getGapSymbol(List alphas)

      Get the gap symbol appropriate to this list of alphabets.

      The gap symbol with have the same shape a the alphabet list. It will be as long as the list, and if any of the alphabets in the list have a dimension greater than 1, it will also insert the appropriate gap there.

      Parameters:
      alphas - List of alphabets
      Returns:
      the appropriate gap symbol for the alphabet list
    • createSymbol

      public static AtomicSymbol createSymbol(String name, Annotation annotation)

      Generate a new AtomicSymbol instance with a name and Annotation.

      Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.

      Parameters:
      name - the String returned by getName()
      annotation - the Annotation returned by getAnnotation()
      Returns:
      a new AtomicSymbol instance
    • createSymbol

      public static AtomicSymbol createSymbol(String name)

      Generate a new AtomicSymbol instance with a name and an Empty Annotation.

      Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.

      Parameters:
      name - the String returned by getName()
      Returns:
      a new AtomicSymbol instance
    • createSymbol

      public static AtomicSymbol createSymbol(char token, String name, Annotation annotation)
      Deprecated.
      Use the two-arg version of this method instead.

      Generate a new AtomicSymbol instance with a token, name and Annotation.

      Use this method if you wish to create an AtomicSymbol instance. Initially it will not be a member of any alphabet.

      Parameters:
      token - the Char token returned by getToken() (ignpred as of BioJava 1.2)
      name - the String returned by getName()
      annotation - the Annotation returned by getAnnotation()
      Returns:
      a new AtomicSymbol instance
    • createSymbol

      public static Symbol createSymbol(char token, Annotation annotation, List symList, Alphabet alpha) throws IllegalSymbolException
      Deprecated.
      use the new version, without the token argument

      Generates a new Symbol instance that represents the tuple of Symbols in symList.

      This method is most useful for writing Alphabet implementations. It should not be invoked by casual users. Use alphabet.getSymbol(List) instead.

      Parameters:
      token - the Symbol's token [ignored since 1.2]
      annotation - The annotation bundle for the symbol
      symList - a list of Symbol objects
      alpha - the Alphabet that this Symbol will reside in
      Returns:
      a Symbol that encapsulates that List
      Throws:
      IllegalSymbolException - If the Symbol cannot be made
    • createSymbol

      public static Symbol createSymbol(Annotation annotation, List symList, Alphabet alpha) throws IllegalSymbolException

      Generates a new Symbol instance that represents the tuple of Symbols in symList. This will attempt to return the same symbol for the same list.

      This method is most useful for writing Alphabet implementations. It should not be invoked by casual users. Use alphabet.getSymbol(List) instead.

      Parameters:
      annotation - The annotation bundle for the Symbol
      symList - a list of Symbol objects
      alpha - the Alphabet that this Symbol will reside in
      Returns:
      a Symbol that encapsulates that List
      Throws:
      IllegalSymbolException - If the Symbol cannot be made
    • createSymbol

      public static Symbol createSymbol(char token, Annotation annotation, Set symSet, Alphabet alpha) throws IllegalSymbolException
      Deprecated.
      use the three-arg version of this method instead.

      Generates a new Symbol instance that represents the tuple of Symbols in symList.

      This method is most useful for writing Alphabet implementations. It should not be invoked by users. Use alphabet.getSymbol(Set) instead.

      Parameters:
      token - the Symbol's token [ignored since 1.2]
      annotation - the Symbol's Annotation
      symSet - a Set of Symbol objects
      alpha - the Alphabet that this Symbol will reside in
      Returns:
      a Symbol that encapsulates that List
      Throws:
      IllegalSymbolException - If the Symbol cannot be made
    • createSymbol

      public static Symbol createSymbol(Annotation annotation, Set symSet, Alphabet alpha) throws IllegalSymbolException

      Generates a new Symbol instance that represents the tuple of Symbols in symList.

      This method is most useful for writing Alphabet implementations. It should not be invoked by users. Use alphabet.getSymbol(Set) instead.

      Parameters:
      annotation - the Symbol's Annotation
      symSet - a Set of Symbol objects
      alpha - the Alphabet that this Symbol will reside in
      Returns:
      a Symbol that encapsulates that List
      Throws:
      IllegalSymbolException - If the Symbol cannot be made
    • generateCrossProductAlphaFromName

      Generates a new CrossProductAlphabet from the give name.
      Parameters:
      name - the name to parse
      Returns:
      the associated Alphabet
    • getCrossProductAlphabet

      public static Alphabet getCrossProductAlphabet(List aList)

      Retrieve a CrossProductAlphabet instance over the alphabets in aList.

      If all of the alphabets in aList implements FiniteAlphabet then the method will return a FiniteAlphabet. Otherwise, it returns a non-finite alphabet.

      If you call this method twice with a list containing the same alphabets, it will return the same alphabet. This promotes the re-use of alphabets and helps to maintain the 'flyweight' principal for finite alphabet symbols.

      The resulting alphabet cpa will be retrievable via AlphabetManager.alphabetForName(cpa.getName())

      Parameters:
      aList - a list of Alphabet objects
      Returns:
      a CrossProductAlphabet that is over the alphabets in aList
    • getCrossProductAlphabet

      Attempts to create a cross product alphabet and register it under a name.
      Parameters:
      aList - A list of alphabets
      name - The name which the new alphabet will be registered under.
      Returns:
      The CrossProductAlphabet
      Throws:
      IllegalAlphabetException - If the Alphabet cannot be made or a different alphabet is already registed under this name.
    • getCrossProductAlphabet

      public static Alphabet getCrossProductAlphabet(List aList, Alphabet parent)

      Retrieve a CrossProductAlphabet instance over the alphabets in aList.

      This method is most usefull for implementors of cross-product alphabets, allowing them to safely build the matches alphabets for ambiguity symbols.

      If all of the alphabets in aList implements FiniteAlphabet then the method will return a FiniteAlphabet. Otherwise, it returns a non-finite alphabet.

      If you call this method twice with a list containing the same alphabets, it will return the same alphabet. This promotes the re-use of alphabets and helps to maintain the 'flyweight' principal for finite alphabet symbols.

      The resulting alphabet cpa will be retrievable via AlphabetManager.alphabetForName(cpa.getName())

      Parameters:
      aList - a list of Alphabet objects
      parent - a parent alphabet
      Returns:
      a CrossProductAlphabet that is over the alphabets in aList
    • factorize

      public static List factorize(Alphabet alpha, Set symSet) throws IllegalSymbolException

      Return a list of BasisSymbol instances that uniquely sum up all AtomicSymbol instances in symSet. If the symbol can't be represented by a single list of BasisSymbol instances, return null.

      This method is most useful for implementers of Alphabet and Symbol. It probably should not be invoked by users.

      Parameters:
      alpha - the Alphabet instance that the Symbols are from
      symSet - the Set of AtomicSymbol instances
      Returns:
      a List of BasisSymbols
      Throws:
      IllegalSymbolException - In practice it should not. If it does it probably indicates a subtle bug somewhere in AlphabetManager
    • loadAlphabets

      Load additional Alphabets, defined in XML format, into the AlphabetManager's registry. These can the be retrieved by calling alphabetForName.
      Parameters:
      is - an InputSource encapsulating the document to be parsed
      Throws:
      IOException - if there is an error accessing the stream
      SAXException - if there is an error while parsing the document
      BioException - if a problem occurs when creating the new Alphabets.
      Since:
      1.3
    • getAlphabetIndex

      Get an indexer for a specified alphabet.
      Parameters:
      alpha - The alphabet to index
      Returns:
      an AlphabetIndex instance
      Since:
      1.1
    • getAlphabetIndex

      Get an indexer for an array of symbols.
      Parameters:
      syms - the Symbols to index in that order
      Returns:
      an AlphabetIndex instance
      Throws:
      IllegalSymbolException
      BioException
      Since:
      1.1