Package pal.alignment
Class AlignmentUtils
java.lang.Object
pal.alignment.AlignmentUtils
Helper utilities for alignments.
- Version:
- $Id: AlignmentUtils.java,v 1.29 2004/10/14 02:01:43 matt Exp $
- Author:
- Alexei Drummond
* @note
- 14 August 2003 - Changed call to new SimpleAlignment() to reflect change in construtors (refered to not calculating frequencies but that no longer happens anyhow)
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic final Alignment
concatAlignments
(Alignment[] alignments, DataType dt) Concatenates an array of alignments such that the resulting alignment is all of the sub alignments place along side each otherstatic final int
countUnknowns
(Alignment a, DataType dt) Tests the characters of an alignment to see if there are any characters that are not within a data type.static double[]
Estimate the frequencies of codons, calculated from the average nucleotide frequencies.static double[]
Estimate the frequencies of codons, calculated from the average nucleotide frequencies at the three codon positions.static double[]
count statesstatic double[][]
estimateTupletFrequencies
(Alignment a, int tupletSize) Estimates frequencies via tuplets.static final void
getAlignedSequenceIndices
(Alignment a, int i, int[] indices, DataType dataType, int unknownState) Returns state indices for a sequence.static final int[][]
getAlignedStates
(Alignment base) Unknown characters are given the state of -1static final int[][]
getAlignedStates
(Alignment base, int unknownState) static double
getAlignmentPenalty
(Alignment a, DataType dataType, TransitionPenaltyTable penalties, double gapCreation, double gapExtension, boolean local) Returns total sum of pairs alignment distance using gap creation and extension penalties and transition penalties as defined in the TransitionPenaltyTable provided.static double
getAlignmentPenalty
(Alignment a, TransitionPenaltyTable penalties, double gapCreation, double gapExtension) Returns total sum of pairs alignment penalty using gap creation and extension penalties and transition penalties in the TransitionPenaltyTable provided.static final Alignment
getChangedDataType
(Alignment a, DataType dt) Returns an alignment which follows the pattern of the input alignment except that all sites which do not contain states in dt (excluding the gap character) are removed.static double
getConsistency
(Alignment a, Alignment b) static final Alignment
Creates a new nucleotide alignment based on the input that has any leading incomplete codons (that is, the first codon of the sequence that is not a gap/unknown but is not complete - has a nucleotide unknown) replaced by a triplet of unknownsstatic void
getPositionMisalignmentInfo
(Alignment a, PrintWriter out, int startingCodonPosition) static void
getPositionMisalignmentInfo
(Alignment a, PrintWriter out, int startingCodonPosition, CodonTable translator, boolean removeIncompleteCodons) static final char[]
getSequenceCharArray
(Alignment a, int sequence) Returns a particular sequence of an alignment as a char arraystatic final String
getSequenceString
(Alignment a, int sequence) Returns a particular sequence of an alignment as a Stringstatic DataType
getSuitableInstance
(char[][] sequences) guess data type suitable for a given sequence data setstatic DataType
getSuitableInstance
(String[] sequences) guess data type suitable for a given sequence data setstatic DataType
getSuitableInstance
(Alignment alignment) guess data type suitable for a given sequence data setstatic final boolean
Returns true if the alignment has a gap at the site in the sequence specified.static final boolean
isSiteRedundant
(Alignment a, int site) static void
print
(Alignment a, PrintWriter out) print alignment (default format: INTERLEAVED)static void
printCLUSTALW
(Alignment a, PrintWriter out) Print alignment (in CLUSTAL W format)static void
printInterleaved
(Alignment a, PrintWriter out) print alignment (in PHYLIP 3.4 INTERLEAVED format)static void
printPlain
(Alignment a, PrintWriter out) print alignment (in plain format)static void
printPlain
(Alignment a, PrintWriter out, boolean relaxed) print alignment (in plain format)static void
printSequential
(Alignment a, PrintWriter out) print alignment (in PHYLIP SEQUENTIAL format)static final Alignment
static void
report
(Alignment a, PrintWriter out) Report number of sequences, sites, and data type
-
Constructor Details
-
AlignmentUtils
public AlignmentUtils()
-
-
Method Details
-
report
Report number of sequences, sites, and data type -
print
print alignment (default format: INTERLEAVED) -
printPlain
print alignment (in plain format) -
printPlain
print alignment (in plain format) -
printSequential
print alignment (in PHYLIP SEQUENTIAL format) -
printInterleaved
print alignment (in PHYLIP 3.4 INTERLEAVED format) -
printCLUSTALW
Print alignment (in CLUSTAL W format) -
getAlignedSequenceIndices
public static final void getAlignedSequenceIndices(Alignment a, int i, int[] indices, DataType dataType, int unknownState) Returns state indices for a sequence. -
getAlignedStates
Unknown characters are given the state of -1 -
getAlignedStates
-
getAlignmentPenalty
public static double getAlignmentPenalty(Alignment a, TransitionPenaltyTable penalties, double gapCreation, double gapExtension) Returns total sum of pairs alignment penalty using gap creation and extension penalties and transition penalties in the TransitionPenaltyTable provided. By default this is end-weighted. -
getAlignmentPenalty
public static double getAlignmentPenalty(Alignment a, DataType dataType, TransitionPenaltyTable penalties, double gapCreation, double gapExtension, boolean local) Returns total sum of pairs alignment distance using gap creation and extension penalties and transition penalties as defined in the TransitionPenaltyTable provided. Gap cost calculated as follows: given gap of length len => gapCreation + (1en-l)*gapExtension- Parameters:
gapCreation
- the cost of the initial gap opening charactergapExtension
- the cost of the remaining gap characterslocal
- true if end gaps ignored, false otherwise
-
getSuitableInstance
guess data type suitable for a given sequence data set- Parameters:
alignment
- alignment- Returns:
- suitable DataType object
-
getSuitableInstance
guess data type suitable for a given sequence data set- Parameters:
alignment
- the alignment represented as an array of strings- Returns:
- suitable DataType object
-
getSuitableInstance
guess data type suitable for a given sequence data set- Parameters:
alignment
- the alignment represented as an array of strings- Returns:
- suitable DataType object
-
estimateCodonFrequenciesF1X4
Estimate the frequencies of codons, calculated from the average nucleotide frequencies. As for CodonFreq = F1X4 (1) in PAML- Parameters:
a
- The base alignment, will be converted to nucleotides- Returns:
- The codon frequences as estimated by the average of nuceltoide frequences
-
estimateCodonFrequenciesF3X4
Estimate the frequencies of codons, calculated from the average nucleotide frequencies at the three codon positions. As for CodonFreq = F3X4 (2) in PAML- Parameters:
a
- The base alignment, will be converted to nucleotides- Returns:
- The codon frequences as estimated by the average of nuceltoide frequences
-
estimateFrequencies
count states -
estimateTupletFrequencies
Estimates frequencies via tuplets. This is most useful for nucleotide data where the frequencies at each codon position are\ of interest (tuplet size = 3)- Parameters:
a
- The input alignmenttupletSize
- the size of the tuplet
-
isSiteRedundant
-
removeRedundantSites
-
isGap
Returns true if the alignment has a gap at the site in the sequence specified. -
getPositionMisalignmentInfo
public static void getPositionMisalignmentInfo(Alignment a, PrintWriter out, int startingCodonPosition, CodonTable translator, boolean removeIncompleteCodons) - Parameters:
startingCodonPosition
- from {0,1,2}, representing codon position of first value in sequences...translator
- the translator to use for converting codons to amino acids.removeIncompleteCodons
- removes end codons that are not complete (due to startingPosition, and sequence length).
-
getPositionMisalignmentInfo
public static void getPositionMisalignmentInfo(Alignment a, PrintWriter out, int startingCodonPosition) - Parameters:
startingCodonPosition
- - from {0,1,2}, representing codon position of first value in sequences...
-
concatAlignments
Concatenates an array of alignments such that the resulting alignment is all of the sub alignments place along side each other -
getSequenceCharArray
Returns a particular sequence of an alignment as a char array -
getSequenceString
Returns a particular sequence of an alignment as a String -
getChangedDataType
Returns an alignment which follows the pattern of the input alignment except that all sites which do not contain states in dt (excluding the gap character) are removed. The Datatype of the returned alignment is dt -
countUnknowns
Tests the characters of an alignment to see if there are any characters that are not within a data type. -
getLeadingIncompleteCodonsStripped
Creates a new nucleotide alignment based on the input that has any leading incomplete codons (that is, the first codon of the sequence that is not a gap/unknown but is not complete - has a nucleotide unknown) replaced by a triplet of unknowns- Parameters:
base
- The basis alignment (of any molecular data type)- Returns:
- the resulting alignment
-
getConsistency
- Returns:
- the consistency of homology assignment between two alignments.
-