Class MotifTools

java.lang.Object
org.biojava.bio.symbol.MotifTools

public class MotifTools extends Object
MotifTools contains utility methods for sequence motifs.
Author:
Keith James
  • Constructor Details

  • Method Details

    • createRegex

      public static String createRegex(SymbolList motif)

      createRegex creates a regular expression which matches the SymbolList. Ambiguous Symbols are simply transformed into character classes. For example the nucleotide sequence "AAGCTT" becomes "A{2}GCT{2}" and "CTNNG" is expanded to "CT[ABCDGHKMNRSTVWY]{2}G". The character class is generated using the getMatches method of an ambiguity symbol to obtain the alphabet of AtomicSymbols it matches, followed by calling getAllSymbols on this alphabet, removal of any gap symbols and then tokenization of the remainder. The ordering of the tokens in a character class is by ascending numerical order of their tokens as determined by Arrays.sort(char []).

      The Alphabet of the SymbolList must be finite and must have a character token type. Regular expressions may be generated for any such SymbolList, not just DNA, RNA and protein.

      Parameters:
      motif - a SymbolList.
      Returns:
      a String regular expression.