Package core :: Module insert_mutations
[hide private]
[frames] | no frames]

Module insert_mutations

Created 2012 core Script for the generation of the artificial reference genome

The functions purpose is to to go through a list of positions and find balanced mutations which fulfill the demands on the artificial reference genome. Once a initial start positions is randomly selected all possible triplets with hamming distance 1 are generated and looked up in a dictionary which contains all triplet positions in the input genome. If a suitable partner is found for the initial mutation the next start positions is chosen randomly. Else: try all other triplets with hamming distance 1 until no one is left. This process can be accelerated by allowing unbalanced mutations, but this will cause differences in the NUC/AA distribution and the AA neighborhood.


Author: Sven Giese

Functions [hide private]
list,list
getMutation(AA, Codon)
Returns a random mutation for a given AA and its Codon(DNA).
Char,Char,int
getdifference(triplet_old, triplet_new)
Given two triplets, returns the differences between them plus the position
Bool
isvalidposition(pdic, iprime, distance)
Checks if a position is valid for mutation.
list
mutate_random(DNA, AminoAcid, distance, pdic, rev, header, Random, outputpath)
Mutates a given DNA(AminoAcid) Genomesequence on several positions (distance based on DISTANCE var.
Variables [hide private]
  __package__ = 'core'
Function Details [hide private]

getMutation(AA, Codon)

 

Returns a random mutation for a given AA and its Codon(DNA). The mutation is done in a way which supports the equilibrium of the nucleotide distribution by only regarding hamming distance=1 Codons as possible mutations

Parameters:
  • AA (string) - Single AA.
  • Codon (string) - 3-letter dna code.
Returns: list,list
A list of all valid mutations (triplet) and the coresponding AA.

getdifference(triplet_old, triplet_new)

 

Given two triplets, returns the differences between them plus the position

Parameters:
  • triplet_old (string) - AA triplet.
  • triplet_new (string) - AA triplet.
Returns: Char,Char,int
The new aminoacid, the old aminoacid and the position.

isvalidposition(pdic, iprime, distance)

 

Checks if a position is valid for mutation. It queries all neighboring positions (iprime +-distance) to check whether there already was a mutation in pdic

Parameters:
  • pdic (dictionary) - Diciontary containing mutations and start/ stop codons..
  • iprime (int) - Position of the prospective mutation (DNA level)
  • distance (int) - User defined parameter which limits the distance between two mutations.
Returns: Bool
Boolean which decides if the position is valid (1= yes,0 = no)

mutate_random(DNA, AminoAcid, distance, pdic, rev, header, Random, outputpath)

 

Mutates a given DNA(AminoAcid) Genomesequence on several positions (distance based on DISTANCE var. If one mutation is done a compareable Triplet is searched to "reverse" the changes made in AA distribution, N distribution, AA neighborhood

Parameters:
  • DNA (list) - DNA sequence of the reference genome.
  • AminoAcid (list) - AA sequence of the reference genome.
  • rev (Bool) - Boolean which decides if unbalanced mutations are allowed (only initial mutation is performed)
  • pdic (dictionary) - Diciontary containing mutations and start/ stop codons..
  • header (string) - Header for the resulting artificial reference file (fasta format).
  • Random (Bool) - Boolean for choosing on of the mutation modes (linear = 0,random = 1)
  • distance (int) - User defined parameter which limits the distance between two mutations.
Returns: list
Artificial reference genome sequence.