libStatGen Software  1
FastQFile Class Reference

Class for reading/validating a fastq file. More...

#include <FastQFile.h>

Collaboration diagram for FastQFile:

Public Member Functions

 FastQFile (int minReadLength=10, int numPrintableErrors=20)
 Constructor. More...
 
void disableMessages ()
 Disable messages - do not write to cout.
 
void enableMessages ()
 Enable messages - write to cout.
 
void disableSeqIDCheck ()
 Disable Unique Sequence ID checking (Unique Sequence ID checking is enabled by default).
 
void enableSeqIDCheck ()
 Enable Unique Sequence ID checking. More...
 
void setMaxErrors (int maxErrors)
 Set the number of errors after which to quit reading/validating a file, defaults to -1. More...
 
FastQStatus::Status openFile (const char *fileName, BaseAsciiMap::SPACE_TYPE spaceType=BaseAsciiMap::UNKNOWN)
 Open a FastQFile. More...
 
FastQStatus::Status closeFile ()
 Close a FastQFile.
 
bool isOpen ()
 Check to see if the file is open.
 
bool isEof ()
 Check to see if the file is at the end of the file.
 
bool keepReadingFile ()
 Returns whether or not to keep reading the file, it stops reading (false) if eof or there is a problem reading the file.
 
FastQStatus::Status validateFastQFile (const String &filename, bool printBaseComp, BaseAsciiMap::SPACE_TYPE spaceType, bool printQualAvg=false)
 Validate the specified fastq file. More...
 
FastQStatus::Status readFastQSequence ()
 Read 1 FastQSequence, validating it.
 

Public Sequence Line variables.

Keep public variables for a sequence's line so they can be accessed without having to do string copies.

String myRawSequence
 
String mySequenceIdLine
 
String mySequenceIdentifier
 
String myPlusLine
 
String myQualityString
 
BaseAsciiMap::SPACE_TYPE getSpaceType ()
 Get the space type used for this file.
 

Detailed Description

Class for reading/validating a fastq file.

Definition at line 29 of file FastQFile.h.

Constructor & Destructor Documentation

◆ FastQFile()

FastQFile::FastQFile ( int  minReadLength = 10,
int  numPrintableErrors = 20 
)

Constructor.

/param minReadLength The minimum length that a base sequence must be for it to be valid.

Parameters
numPrintableErrorsThe maximum number of errors that should be reported in detail before suppressing the errors.

Definition at line 30 of file FastQFile.cpp.

31  : myFile(NULL),
32  myBaseComposition(),
33  myQualPerCycle(),
34  myCountPerCycle(),
35  myCheckSeqID(true),
36  myMinReadLength(minReadLength),
37  myNumPrintableErrors(numPrintableErrors),
38  myMaxErrors(-1),
39  myDisableMessages(false),
40  myFileProblem(false)
41 {
42  // Reset the member data.
43  reset();
44 }

Member Function Documentation

◆ enableSeqIDCheck()

void FastQFile::enableSeqIDCheck ( )

Enable Unique Sequence ID checking.

(Unique Sequence ID checking is enabled by default).

Definition at line 69 of file FastQFile.cpp.

70 {
71  myCheckSeqID = true;
72 }

◆ openFile()

FastQStatus::Status FastQFile::openFile ( const char *  fileName,
BaseAsciiMap::SPACE_TYPE  spaceType = BaseAsciiMap::UNKNOWN 
)

Open a FastQFile.

Use the specified SPACE_TYPE to determine BASE, COLOR, or UNKNOWN.

Definition at line 83 of file FastQFile.cpp.

85 {
86  // reset the member data.
87  reset();
88 
89  myBaseComposition.resetBaseMapType();
90  myBaseComposition.setBaseMapType(spaceType);
91  myQualPerCycle.clear();
92  myCountPerCycle.clear();
93 
95 
96  // Close the file if there is already one open - checked by close.
97  status = closeFile();
98  if(status == FastQStatus::FASTQ_SUCCESS)
99  {
100  // Successfully closed a previously opened file if there was one.
101 
102  // Open the file
103  myFile = ifopen(fileName, "rt");
104  myFileName = fileName;
105 
106  if(myFile == NULL)
107  {
108  // Failed to open the file.
110  }
111  }
112 
113  if(status != FastQStatus::FASTQ_SUCCESS)
114  {
115  // Failed to open the file.
116  std::string errorMessage = "ERROR: Failed to open file: ";
117  errorMessage += fileName;
118  logMessage(errorMessage.c_str());
119  }
120  return(status);
121 }

References closeFile(), FastQStatus::FASTQ_OPEN_ERROR, FastQStatus::FASTQ_SUCCESS, ifopen(), BaseComposition::resetBaseMapType(), and BaseComposition::setBaseMapType().

Referenced by validateFastQFile().

◆ setMaxErrors()

void FastQFile::setMaxErrors ( int  maxErrors)

Set the number of errors after which to quit reading/validating a file, defaults to -1.

Parameters
maxErrors# of errors before quitting, -1 indicates to not quit until the entire file has been read/validated (default), 0 indicates to quit without reading/validating anything.

Definition at line 76 of file FastQFile.cpp.

77 {
78  myMaxErrors = maxErrors;
79 }

◆ validateFastQFile()

FastQStatus::Status FastQFile::validateFastQFile ( const String filename,
bool  printBaseComp,
BaseAsciiMap::SPACE_TYPE  spaceType,
bool  printQualAvg = false 
)

Validate the specified fastq file.

Parameters
filenamefastq file to be validated.
printBaseCompwhether or not to print the base composition for the file. true means print it, false means do not.
spaceTypethe spaceType to use for validation - BASE_SPACE, COLOR_SPACE, or UNKNOWN (UNKNOWN means to determine the spaceType to validate against from the first character of the first sequence).
printQualAvgwhether or not to print the quality averages for the file. true means to print it, false (default) means do not.
Returns
the fastq validation status, SUCCESS on a successfully validated fastq file.

Definition at line 195 of file FastQFile.cpp.

199 {
200  // Open the fastqfile.
201  if(openFile(filename, spaceType) != FastQStatus::FASTQ_SUCCESS)
202  {
203  // Failed to open the specified file.
205  }
206 
207  // Track the total number of sequences that were validated.
208  int numSequences = 0;
209 
210  // Keep reading the file until there are no more fastq sequences to process
211  // and not configured to quit after a certain number of errors or there
212  // has not yet been that many errors.
213  // Or exit if there is a problem reading the file.
215  while (keepReadingFile() &&
216  ((myMaxErrors == -1) || (myMaxErrors > myNumErrors)))
217  {
218  // Validate one sequence. This call will read all the lines for
219  // one sequence.
220  status = readFastQSequence();
221  if((status == FastQStatus::FASTQ_SUCCESS) || (status == FastQStatus::FASTQ_INVALID))
222  {
223  // Read a sequence and it is either valid or invalid, but
224  // either way, a sequence was read, so increment the sequence count.
225  ++numSequences;
226  }
227  else
228  {
229  // Other error, so break out of processing.
230  break;
231  }
232  }
233 
234  // Report Base Composition Statistics.
235  if(printBaseComp)
236  {
237  myBaseComposition.print();
238  }
239 
240  if(printQualAvg)
241  {
242  printAvgQual();
243  }
244 
245  std::string finishMessage = "Finished processing ";
246  finishMessage += myFileName.c_str();
247  char buffer[100];
248  if(sprintf(buffer,
249  " with %u lines containing %d sequences.",
250  myLineNum, numSequences) > 0)
251  {
252  finishMessage += buffer;
253  logMessage(finishMessage.c_str());
254  }
255  if(sprintf(buffer,
256  "There were a total of %d errors.",
257  myNumErrors) > 0)
258  {
259  logMessage(buffer);
260  }
261 
262  // Close the input file.
263  FastQStatus::Status closeStatus = closeFile();
264 
265  if((status != FastQStatus::FASTQ_SUCCESS) && (status != FastQStatus::FASTQ_INVALID))
266  {
267  // Stopped validating due to some error other than invalid, so
268  // return that error.
269  return(status);
270  }
271  else if(myNumErrors == 0)
272  {
273  // No errors, check to see if there were any sequences.
274  // Finished processing all of the sequences in the file.
275  // If there are no sequences, report an error.
276  if(numSequences == 0)
277  {
278  // Empty file, return error.
279  logMessage("ERROR: No FastQSequences in the file.");
281  }
283  }
284  else
285  {
286  // The file is invalid. But check the close status. If the close
287  // failed, it means there is a problem with the file itself not just
288  // with validation, so the close failure should be returned.
289  if(closeStatus != FastQStatus::FASTQ_SUCCESS)
290  {
291  return(closeStatus);
292  }
294  }
295 }

References closeFile(), FastQStatus::FASTQ_INVALID, FastQStatus::FASTQ_NO_SEQUENCE_ERROR, FastQStatus::FASTQ_OPEN_ERROR, FastQStatus::FASTQ_SUCCESS, keepReadingFile(), openFile(), BaseComposition::print(), and readFastQSequence().


The documentation for this class was generated from the following files:
FastQFile::openFile
FastQStatus::Status openFile(const char *fileName, BaseAsciiMap::SPACE_TYPE spaceType=BaseAsciiMap::UNKNOWN)
Open a FastQFile.
Definition: FastQFile.cpp:83
BaseComposition::setBaseMapType
void setBaseMapType(BaseAsciiMap::SPACE_TYPE spaceType)
Set the base map type for this composition.
Definition: BaseComposition.h:52
BaseComposition::resetBaseMapType
void resetBaseMapType()
Reset the base map type for this composition.
Definition: BaseComposition.h:46
ifopen
IFILE ifopen(const char *filename, const char *mode, InputFile::ifileCompression compressionMode=InputFile::DEFAULT)
Open a file with the specified name and mode, using a filename of "-" to indicate stdin/stdout.
Definition: InputFile.h:562
FastQStatus::FASTQ_OPEN_ERROR
@ FASTQ_OPEN_ERROR
means the file could not be opened.
Definition: FastQStatus.h:35
FastQFile::closeFile
FastQStatus::Status closeFile()
Close a FastQFile.
Definition: FastQFile.cpp:125
FastQFile::keepReadingFile
bool keepReadingFile()
Returns whether or not to keep reading the file, it stops reading (false) if eof or there is a proble...
Definition: FastQFile.cpp:184
FastQStatus::FASTQ_INVALID
@ FASTQ_INVALID
means that the sequence was invalid.
Definition: FastQStatus.h:33
BaseComposition::print
void print()
Print the composition.
Definition: BaseComposition.cpp:70
FastQStatus::FASTQ_SUCCESS
@ FASTQ_SUCCESS
indicates method finished successfully.
Definition: FastQStatus.h:32
FastQFile::readFastQSequence
FastQStatus::Status readFastQSequence()
Read 1 FastQSequence, validating it.
Definition: FastQFile.cpp:299
FastQStatus::FASTQ_NO_SEQUENCE_ERROR
@ FASTQ_NO_SEQUENCE_ERROR
means there were no errors, but no sequences read.
Definition: FastQStatus.h:38
FastQStatus::Status
Status
Return value enum for the FastQFile class methods, indicating success or error codes.
Definition: FastQStatus.h:30