Package org.biojava.bio.dist
Class DistributionTools
java.lang.Object
org.biojava.bio.dist.DistributionTools
A class to hold static methods for calculations and manipulations using
Distributions.
- Since:
- 1.2
- Author:
- Mark Schreiber, Matthew Pocock
-
Method Summary
Modifier and TypeMethodDescriptionstatic final boolean
areEmissionSpectraEqual
(Distribution[] a, Distribution[] b) Compares the emission spectra of two distribution arrays.static final boolean
Compares the emission spectra of two distributions.static final Distribution
average
(Distribution[] dists) Averages two or more distributions.static final double
bitsOfInformation
(Distribution observed) Calculates the total bits of information for a distribution.static Distribution
Make a distribution from a count.static Distribution[]
Equivalent to distOverAlignment(a, false, 0.0).static final Distribution[]
distOverAlignment
(Alignment a, boolean countGaps) Creates an array of distributions, one for each column of the alignment.static final Distribution[]
distOverAlignment
(Alignment a, boolean countGaps, double nullWeight) Creates an array of distributions, one for each column of the alignment.protected static final Sequence
generateOrderNSequence
(String name, OrderNDistribution d, int length) Deprecated.use generateSequence() or generateSymbolList() instead.static final Sequence
generateSequence
(String name, Distribution d, int length) Produces a sequence by randomly sampling the Distribution.static final SymbolList
generateSymbolList
(Distribution d, int length) Produces aSymbolList
by randomly sampling a Distribution.static final Distribution
jointDistOverAlignment
(Alignment a, boolean countGaps, double nullWeight, int[] cols) Creates a joint distribution.static final HashMap
KLDistance
(Distribution observed, Distribution expected, double logBase) A method to calculate the Kullback-Liebler Distance (relative entropy).static void
Randomizes the weights of aDistribution
.static Distribution
Read a distribution from XML.static final HashMap
shannonEntropy
(Distribution observed, double logBase) A method to calculate the Shannon Entropy for a Distribution.static double
totalEntropy
(Distribution observed) Calculates the total Entropy for a Distribution.static void
writeToXML
(Distribution d, OutputStream os) Writes a Distribution to XML that can be read with the readFromXML method.
-
Method Details
-
writeToXML
Writes a Distribution to XML that can be read with the readFromXML method.- Parameters:
d
- the Distribution to write.os
- where to write it to.- Throws:
IOException
- if writing fails
-
readFromXML
Read a distribution from XML.- Parameters:
is
- an InputStream to read from- Returns:
- a Distribution parameterised by the xml in is
- Throws:
IOException
- if is failedSAXException
- if is could not be processed as XML
-
randomizeDistribution
Randomizes the weights of aDistribution
.- Parameters:
d
- theDistribution
to randomize- Throws:
ChangeVetoException
- if the Distribution is locked
-
countToDistribution
Make a distribution from a count.- Parameters:
c
- the count- Returns:
- a Distrubution over the same
FiniteAlphabet
asc
and trained with the counts ofc
-
areEmissionSpectraEqual
public static final boolean areEmissionSpectraEqual(Distribution a, Distribution b) throws BioException Compares the emission spectra of two distributions.- Parameters:
a
- ADistribution
with the sameAlphabet
asb
b
- ADistribution
with the sameAlphabet
asa
- Returns:
- true if alphabets and symbol weights are equal for the two distributions.
- Throws:
BioException
- if one or both of the Distributions are over infinite alphabets.- Since:
- 1.2
-
areEmissionSpectraEqual
public static final boolean areEmissionSpectraEqual(Distribution[] a, Distribution[] b) throws BioException Compares the emission spectra of two distribution arrays.- Parameters:
a
- ADistribution[]
consisting ofDistributions
over aFiniteAlphabet
b
- ADistribution[]
consisting ofDistributions
over aFiniteAlphabet
- Returns:
- true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
- Throws:
BioException
- if one of the Distributions is over an infinite alphabet.- Since:
- 1.3
-
KLDistance
public static final HashMap KLDistance(Distribution observed, Distribution expected, double logBase) A method to calculate the Kullback-Liebler Distance (relative entropy).- Parameters:
observed
- - the observed frequence ofSymbols
.expected
- - the excpected or background frequency.logBase
- - the log base for the entropy calculation. 2 is standard.- Returns:
- - A HashMap mapping Symbol to
(Double)
relative entropy. - Since:
- 1.2
-
shannonEntropy
A method to calculate the Shannon Entropy for a Distribution.- Parameters:
observed
- - the observed frequence ofSymbols
.logBase
- - the log base for the entropy calculation. 2 is standard.- Returns:
- - A HashMap mapping Symbol to
(Double)
entropy. - Since:
- 1.2
-
totalEntropy
Calculates the total Entropy for a Distribution. Entropies for individualSymbols
are weighted by their probability of occurence.- Parameters:
observed
- the observed frequence ofSymbols
.- Returns:
- the total entropy of the
Distribution
.
-
bitsOfInformation
Calculates the total bits of information for a distribution.- Parameters:
observed
- - the observed frequence ofSymbols
.- Returns:
- the total information content of the
Distribution
. - Since:
- 1.2
-
distOverAlignment
Equivalent to distOverAlignment(a, false, 0.0).- Parameters:
a
- the Alignment- Returns:
- an array of Distribution instances representing columns of the alignment
- Throws:
IllegalAlphabetException
- if the alignment alphabet is not compattible
-
jointDistOverAlignment
public static final Distribution jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols) throws IllegalAlphabetException Creates a joint distribution.- Parameters:
a
- theAlignment
to build theDistribution[]
over.countGaps
- if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)nullWeight
- the number of pseudo counts to add to each distributioncols
- a list of positions in the alignment to include in the joint distribution- Returns:
- a
Distribution
- Throws:
IllegalAlphabetException
- if all sequences don't use the same alphabet- Since:
- 1.2
-
distOverAlignment
public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps, double nullWeight) throws IllegalAlphabetException Creates an array of distributions, one for each column of the alignment.- Parameters:
a
- theAlignment
to build theDistribution[]
over.countGaps
- if true gaps will be included in the distributionsnullWeight
- the number of pseudo counts to add to each distribution, pseudo counts will not affect gaps, no gaps, no gap counts.- Returns:
- a
Distribution[]
where each member of the array is aDistribution
of theSymbols
found at that position of theAlignment
. - Throws:
IllegalAlphabetException
- if all sequences don't use the same alphabet- Since:
- 1.2
-
distOverAlignment
public static final Distribution[] distOverAlignment(Alignment a, boolean countGaps) throws IllegalAlphabetException Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.- Parameters:
a
- theAlignment
to build theDistribution[]
over.countGaps
- if true gaps will be included in the distributions- Returns:
- a
Distribution[]
where each member of the array is aDistribution
of theSymbols
found at that position of theAlignment
. - Throws:
IllegalAlphabetException
- if the alignment is not composed from sequences all with the same alphabet- Since:
- 1.2
-
average
Averages two or more distributions. NOTE the current implementation ignore the null model.- Parameters:
dists
- theDistributions
to average- Returns:
- a
Distribution
were the weight of eachSymbol
is the average of the weights of thatSymbol
in eachDistribution
. - Since:
- 1.2
-
generateSequence
Produces a sequence by randomly sampling the Distribution.- Parameters:
name
- the name for the sequenced
- the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.length
- the number of symbols in the sequence.- Returns:
- a Sequence with name and urn = to name and an Empty Annotation.
-
generateSymbolList
Produces aSymbolList
by randomly sampling a Distribution.- Parameters:
d
- the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.length
- the number of symbols in the sequence.- Returns:
- a SymbolList or length
length
-
generateOrderNSequence
protected static final Sequence generateOrderNSequence(String name, OrderNDistribution d, int length) Deprecated.use generateSequence() or generateSymbolList() instead.Generate a sequence by sampling a distribution.- Parameters:
name
- the name of the sequenced
- the distribution to samplelength
- the length of the sequence- Returns:
- a new sequence with the required composition
-