Package org.biojava.bio.seq
Class DNATools
java.lang.Object
org.biojava.bio.seq.DNATools
Useful functionality for processing DNA sequences.
- Author:
- Matthew Pocock, Keith James (docs), Mark Schreiber, David Huen, Richard Holland
-
Method Summary
Modifier and TypeMethodDescriptionstatic AtomicSymbol
a()
static Symbol
b()
static AtomicSymbol
c()
static Symbol
complement
(Symbol sym) Complement the symbol.static SymbolList
complement
(SymbolList list) Retrieve a complement view of list.static ReversibleTranslationTable
Get a translation table for complementing DNA symbols.static SymbolList
Return a new DNA SymbolList for dna.static Sequence
createDNASequence
(String dna, String name) Return a new DNA Sequence for dna.static GappedSequence
createGappedDNASequence
(String dna, String name) Get a new dna as a GappedSequencestatic Symbol
d()
static char
Get a single-character token for a DNA symbolstatic SymbolList
flip
(SymbolList list, StrandedFeature.Strand strand) Returns a SymbolList that is reverse complemented if the strand is negative, and the origninal one if it is not.static Symbol
forIndex
(int index) Return the symbol for an index - compatible withindex
.static Symbol
forSymbol
(char token) Retrieve the symbol for a symbol.static AtomicSymbol
g()
static FiniteAlphabet
Gets the (DNA x DNA x DNA) Alphabetstatic FiniteAlphabet
getDNA()
Return the DNA alphabet.static Distribution
getDNADistribution
(double fractionGC) return a SimpleDistribution of specified GC content.static FiniteAlphabet
Gets the (DNA x DNA) Alphabetstatic Distribution
getDNAxDNADistribution
(double fractionGC0, double fractionGC1) return a (DNA x DNA) cross-product Distribution with specified DNA contents in each component Alphabet.static Symbol
h()
static int
Return an integer index for a symbol - compatible withforIndex
.static Symbol
k()
static Symbol
m()
static Symbol
n()
static Symbol
r()
static SymbolList
reverseComplement
(SymbolList list) Retrieve a reverse-complement view of list.static Symbol
s()
static AtomicSymbol
t()
static SymbolList
toProtein
(SymbolList syms) Convenience method that directly converts a DNA sequence to RNA then to protein.static SymbolList
toProtein
(SymbolList syms, int start, int end) Convenience method to translate a region of a DNA sequence directly into protein.static SymbolList
toRNA
(SymbolList syms) Converts aSymbolList
from the DNAAlphabet
to the RNAAlphabet
.static SymbolList
transcribeToRNA
(SymbolList syms) Transcribes DNA to RNA.static Symbol
v()
static Symbol
w()
static Symbol
y()
-
Method Details
-
a
-
g
-
c
-
t
-
n
-
m
-
r
-
w
-
s
-
y
-
k
-
v
-
h
-
d
-
b
-
getDNA
Return the DNA alphabet.- Returns:
- a flyweight version of the DNA alphabet
-
getDNAxDNA
Gets the (DNA x DNA) Alphabet- Returns:
- a flyweight version of the (DNA x DNA) alphabet
-
getCodonAlphabet
Gets the (DNA x DNA x DNA) Alphabet- Returns:
- a flyweight version of the (DNA x DNA x DNA) alphabet
-
createDNA
Return a new DNA SymbolList for dna.- Parameters:
dna
- a String to parse into DNA- Returns:
- a SymbolList created form dna
- Throws:
IllegalSymbolException
- if dna contains any non-DNA characters
-
createDNASequence
Return a new DNA Sequence for dna.- Parameters:
dna
- a String to parse into DNAname
- a String to use as the name- Returns:
- a Sequence created form dna
- Throws:
IllegalSymbolException
- if dna contains any non-DNA characters
-
createGappedDNASequence
public static GappedSequence createGappedDNASequence(String dna, String name) throws IllegalSymbolException Get a new dna as a GappedSequence- Throws:
IllegalSymbolException
-
index
Return an integer index for a symbol - compatible withforIndex
.The index for a symbol is stable accross virtual machines invalid input: '&' invocations.
- Parameters:
sym
- the Symbol to index- Returns:
- the index for that symbol
- Throws:
IllegalSymbolException
- if sym is not a member of the DNA alphabet
-
forIndex
Return the symbol for an index - compatible withindex
.The index for a symbol is stable accross virtual machines invalid input: '&' invocations.
- Parameters:
index
- the index to look up- Returns:
- the symbol at that index
- Throws:
IndexOutOfBoundsException
- if index is not between 0 and 3
-
complement
Complement the symbol.- Parameters:
sym
- the symbol to complement- Returns:
- a Symbol that is the complement of sym
- Throws:
IllegalSymbolException
- if sym is not a member of the DNA alphabet
-
forSymbol
Retrieve the symbol for a symbol.- Parameters:
token
- the char to look up- Returns:
- the symbol for that char
- Throws:
IllegalSymbolException
- if the char is not a valid IUB dna code
-
complement
Retrieve a complement view of list.- Parameters:
list
- the SymbolList to complement- Returns:
- a SymbolList that is the complement
- Throws:
IllegalAlphabetException
- if list is not a complementable alphabet
-
reverseComplement
Retrieve a reverse-complement view of list.- Parameters:
list
- the SymbolList to complement- Returns:
- a SymbolList that is the complement
- Throws:
IllegalAlphabetException
- if list is not a complementable alphabet
-
flip
public static SymbolList flip(SymbolList list, StrandedFeature.Strand strand) throws IllegalAlphabetException Returns a SymbolList that is reverse complemented if the strand is negative, and the origninal one if it is not.- Parameters:
list
- the SymbolList to viewstrand
- the Strand to use- Returns:
- the apropreate view of the SymbolList
- Throws:
IllegalAlphabetException
- if list is not a complementable alphabet
-
complementTable
Get a translation table for complementing DNA symbols.- Since:
- 1.1
-
dnaToken
Get a single-character token for a DNA symbol- Throws:
IllegalSymbolException
- ifsym
is not a member of the DNA alphabet
-
getDNADistribution
return a SimpleDistribution of specified GC content.- Parameters:
fractionGC
- (G+C) content as a fraction.
-
getDNAxDNADistribution
return a (DNA x DNA) cross-product Distribution with specified DNA contents in each component Alphabet.- Parameters:
fractionGC0
- (G+C) content of first sequence as a fraction.fractionGC1
- (G+C) content of second sequence as a fraction.
-
toRNA
Converts aSymbolList
from the DNAAlphabet
to the RNAAlphabet
.- Parameters:
syms
- theSymbolList
to convert to RNA- Returns:
- a view on
syms
whereSymbols
have been converted to RNA. Most significantly t's are now u's. The 5' to 3' order of the Symbols is conserved. - Throws:
IllegalAlphabetException
- ifsyms
is not DNA.- Since:
- 1.4
-
transcribeToRNA
Transcribes DNA to RNA. The method more closely represents the biological reality thantoRNA(SymbolList syms)
does. The presented DNASymbolList
is assumed to be the template strand in the 5' to 3' orientation. The resulting RNA is transcribed from this template effectively a reverse complement in the RNA alphabet. The method is equivalent to callingreverseComplement()
andtoRNA()
in sequence.If you are dealing with cDNA sequences that you want converted to RNA you would be better off calling
toRNA(SymbolList syms)
- Parameters:
syms
- theSymbolList
to convert to RNA- Returns:
- a view on
syms
whereSymbols
have been converted to RNA. - Throws:
IllegalAlphabetException
- ifsyms
is not DNA.- Since:
- 1.4
-
toProtein
Convenience method that directly converts a DNA sequence to RNA then to protein. The translated protein is from the +1 reading frame of theSymbolList
. The wholeSymbolList
is translated although up to 2 DNA residues may be truncated if full codons cannot be formed.- Parameters:
syms
- the sequence to be translated.- Returns:
- the translated protein sequence.
- Throws:
IllegalAlphabetException
- ifsyms
is not from the DNA alphabet.- Since:
- 1.5.1
-
toProtein
public static SymbolList toProtein(SymbolList syms, int start, int end) throws IllegalAlphabetException Convenience method to translate a region of a DNA sequence directly into protein. While the start and end can be specified if the length of the specified region is not evenly divisible by three then the translated region will be truncated until a full terminal codon can be formed.- Parameters:
syms
- the DNA sequence to be translated.start
- the location to begin translation.end
- the end of the translated region.- Returns:
- the translated protein sequence.
- Throws:
IllegalAlphabetException
- ifsyms
is not from the DNA alphabet.- Since:
- 1.5.1
-