Class DistributionTools

java.lang.Object
org.biojava.bio.dist.DistributionTools

public final class DistributionTools extends Object
A class to hold static methods for calculations and manipulations using Distributions.
Since:
1.2
Author:
Mark Schreiber, Matthew Pocock
  • Method Details

    • writeToXML

      public static void writeToXML(Distribution d, OutputStream os) throws IOException
      Writes a Distribution to XML that can be read with the readFromXML method.
      Parameters:
      d - the Distribution to write.
      os - where to write it to.
      Throws:
      IOException - if writing fails
    • readFromXML

      Read a distribution from XML.
      Parameters:
      is - an InputStream to read from
      Returns:
      a Distribution parameterised by the xml in is
      Throws:
      IOException - if is failed
      SAXException - if is could not be processed as XML
    • randomizeDistribution

      Randomizes the weights of a Distribution.
      Parameters:
      d - the Distribution to randomize
      Throws:
      ChangeVetoException - if the Distribution is locked
    • countToDistribution

      Make a distribution from a count.
      Parameters:
      c - the count
      Returns:
      a Distrubution over the same FiniteAlphabet as c and trained with the counts of c
    • areEmissionSpectraEqual

      public static final boolean areEmissionSpectraEqual(Distribution a, Distribution b) throws BioException
      Compares the emission spectra of two distributions.
      Parameters:
      a - A Distribution with the same Alphabet as b
      b - A Distribution with the same Alphabet as a
      Returns:
      true if alphabets and symbol weights are equal for the two distributions.
      Throws:
      BioException - if one or both of the Distributions are over infinite alphabets.
      Since:
      1.2
    • areEmissionSpectraEqual

      public static final boolean areEmissionSpectraEqual(Distribution[] a, Distribution[] b) throws BioException
      Compares the emission spectra of two distribution arrays.
      Parameters:
      a - A Distribution[] consisting of Distributions over a FiniteAlphabet
      b - A Distribution[] consisting of Distributions over a FiniteAlphabet
      Returns:
      true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
      Throws:
      BioException - if one of the Distributions is over an infinite alphabet.
      Since:
      1.3
    • KLDistance

      public static final HashMap KLDistance(Distribution observed, Distribution expected, double logBase)
      A method to calculate the Kullback-Liebler Distance (relative entropy).
      Parameters:
      observed - - the observed frequence of Symbols .
      expected - - the excpected or background frequency.
      logBase - - the log base for the entropy calculation. 2 is standard.
      Returns:
      - A HashMap mapping Symbol to (Double) relative entropy.
      Since:
      1.2
    • shannonEntropy

      public static final HashMap shannonEntropy(Distribution observed, double logBase)
      A method to calculate the Shannon Entropy for a Distribution.
      Parameters:
      observed - - the observed frequence of Symbols .
      logBase - - the log base for the entropy calculation. 2 is standard.
      Returns:
      - A HashMap mapping Symbol to (Double) entropy.
      Since:
      1.2
    • totalEntropy

      public static double totalEntropy(Distribution observed)
      Calculates the total Entropy for a Distribution. Entropies for individual Symbols are weighted by their probability of occurence.
      Parameters:
      observed - the observed frequence of Symbols .
      Returns:
      the total entropy of the Distribution .
    • bitsOfInformation

      public static final double bitsOfInformation(Distribution observed)
      Calculates the total bits of information for a distribution.
      Parameters:
      observed - - the observed frequence of Symbols .
      Returns:
      the total information content of the Distribution .
      Since:
      1.2
    • distOverAlignment

      Equivalent to distOverAlignment(a, false, 0.0).
      Parameters:
      a - the Alignment
      Returns:
      an array of Distribution instances representing columns of the alignment
      Throws:
      IllegalAlphabetException - if the alignment alphabet is not compattible
    • jointDistOverAlignment

      public static final Distribution jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols) throws IllegalAlphabetException
      Creates a joint distribution.
      Parameters:
      a - the Alignment to build the Distribution[] over.
      countGaps - if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)
      nullWeight - the number of pseudo counts to add to each distribution
      cols - a list of positions in the alignment to include in the joint distribution
      Returns:
      a Distribution
      Throws:
      IllegalAlphabetException - if all sequences don't use the same alphabet
      Since:
      1.2
    • distOverAlignment

      public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps, double nullWeight) throws IllegalAlphabetException
      Creates an array of distributions, one for each column of the alignment.
      Parameters:
      a - the Alignment to build the Distribution[] over.
      countGaps - if true gaps will be included in the distributions
      nullWeight - the number of pseudo counts to add to each distribution, pseudo counts will not affect gaps, no gaps, no gap counts.
      Returns:
      a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
      Throws:
      IllegalAlphabetException - if all sequences don't use the same alphabet
      Since:
      1.2
    • distOverAlignment

      public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps) throws IllegalAlphabetException
      Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.
      Parameters:
      a - the Alignment to build the Distribution[] over.
      countGaps - if true gaps will be included in the distributions
      Returns:
      a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
      Throws:
      IllegalAlphabetException - if the alignment is not composed from sequences all with the same alphabet
      Since:
      1.2
    • average

      public static final Distribution average(Distribution[] dists)
      Averages two or more distributions. NOTE the current implementation ignore the null model.
      Parameters:
      dists - the Distributions to average
      Returns:
      a Distribution were the weight of each Symbol is the average of the weights of that Symbol in each Distribution .
      Since:
      1.2
    • generateSequence

      public static final Sequence generateSequence(String name, Distribution d, int length)
      Produces a sequence by randomly sampling the Distribution.
      Parameters:
      name - the name for the sequence
      d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
      length - the number of symbols in the sequence.
      Returns:
      a Sequence with name and urn = to name and an Empty Annotation.
    • generateSymbolList

      public static final SymbolList generateSymbolList(Distribution d, int length)
      Produces a SymbolList by randomly sampling a Distribution.
      Parameters:
      d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
      length - the number of symbols in the sequence.
      Returns:
      a SymbolList or length length
    • generateOrderNSequence

      protected static final Sequence generateOrderNSequence(String name, OrderNDistribution d, int length)
      Deprecated.
      use generateSequence() or generateSymbolList() instead.
      Generate a sequence by sampling a distribution.
      Parameters:
      name - the name of the sequence
      d - the distribution to sample
      length - the length of the sequence
      Returns:
      a new sequence with the required composition