Script arden_filter
[hide private]
[frames] | no frames]

Script arden_filter

Script for filtering alignments with deficient properties (rqs,gaps, mismatches). The script generates a new .SAM file from the input SAM file, where only alignments are contained that fulfill certain quality thresholds. nm = numberOfMisMatches rq = readqualityscore MQ = mappingquality

Functions [hide private]
 
writeFilteredReads(iddic, fastqfile, fastqfileoutput)
Reads a .fastqfile and filters them for certain quality values.
 
getMisGapRQ(alngt)
Calculates the number of gaps / mistmatches from an HTSeq alngt.
 
FilterSAM(samfile, MISM, GAPS, RQ, fsam)
Calculates the number of gaps / mistmatches from an HTSeq alngt.
 
readHeader(samfile)
Function to retrieve the header from a SAM file.
 
filter()
Function for filtering a given sam file.
Variables [hide private]
  counter1 = 0
  counter2 = 0
  __package__ = None
Function Details [hide private]

writeFilteredReads(iddic, fastqfile, fastqfileoutput)

 

Reads a .fastqfile and filters them for certain quality values. If an alignment is equally good or better as the threshold it will be written to the filtered .SAM file.

input: fastq file output: fastq dictionary key = readid; value = qualstr

Parameters:
  • iddic (dictionary) - dictionary containing read ids.
  • fastqfile (fastq) - HTSeq readfile object
  • fastqfileoutput (fastq) - HTSeq readfile object. Alignments that pass the filter are written to this file.

getMisGapRQ(alngt)

 

Calculates the number of gaps / mistmatches from an HTSeq alngt. object.

Parameters:
  • alngt (HTSeq alignment object) - Initial SAM file for filtering
  • gaps, mism - Number of gaps / mistmatches.
  • gaps,mism (integer)

FilterSAM(samfile, MISM, GAPS, RQ, fsam)

 

Calculates the number of gaps / mistmatches from an HTSeq alngt. object.

Parameters:
  • alngt (HTSeq alignment object) - Initial SAM file for filtering
  • gaps, mism - Number of gaps / mistmatches.
  • gaps,mism (integer)

filter()

 

Function for filtering a given sam file. The alignments are evaluated individually in regard to RQS,mismatches and gaps. The input needs to be sorted.