libStatGen Software  1
SamRecord Class Reference

Class providing an easy to use interface to get/set/operate on the fields in a SAM/BAM record. More...

#include <SamRecord.h>

Public Types

enum  SequenceTranslation { NONE, EQUAL, BASES }
 Enum containing the settings on how to translate the sequence if a reference is available. More...
 

Public Member Functions

 SamRecord ()
 Default Constructor.
 
 SamRecord (ErrorHandler::HandlingType errorHandlingType)
 Constructor that sets the error handling type. More...
 
 ~SamRecord ()
 Destructor.
 
void resetRecord ()
 Reset the fields of the record to a default value. More...
 
bool isValid (SamFileHeader &header)
 Returns whether or not the record is valid, setting the status to indicate success or failure. More...
 
void setReference (GenomeSequence *reference)
 Set the reference to the specified genome sequence object. More...
 
void setSequenceTranslation (SequenceTranslation translation)
 Set the type of sequence translation to use when getting the sequence. More...
 
Set Alignment Data

Set methods for record fields.

All of the "set" methods set the status to indicate success or the failure reason.

bool setReadName (const char *readName)
 Set QNAME to the passed in name. More...
 
bool setFlag (uint16_t flag)
 Set the bitwise FLAG to the specified value. More...
 
bool setReferenceName (SamFileHeader &header, const char *referenceName)
 Set the reference sequence name (RNAME) to the specified name, using the header to determine the reference id. More...
 
bool set1BasedPosition (int32_t position)
 Set the leftmost position (POS) using the specified 1-based (SAM format) value. More...
 
bool set0BasedPosition (int32_t position)
 Set the leftmost position using the specified 0-based (BAM format) value. More...
 
bool setMapQuality (uint8_t mapQuality)
 Set the mapping quality (MAPQ). More...
 
bool setCigar (const char *cigar)
 Set the CIGAR to the specified SAM formatted cigar string. More...
 
bool setCigar (const Cigar &cigar)
 Set the CIGAR to the specified Cigar object. More...
 
bool setMateReferenceName (SamFileHeader &header, const char *mateReferenceName)
 Set the mate/next fragment's reference sequence name (RNEXT) to the specified name, using the header to determine the mate reference id. More...
 
bool set1BasedMatePosition (int32_t matePosition)
 Set the mate/next fragment's leftmost position (PNEXT) using the specified 1-based (SAM format) value. More...
 
bool set0BasedMatePosition (int32_t matePosition)
 Set the mate/next fragment's leftmost position using the specified 0-based (BAM format) value. More...
 
bool setInsertSize (int32_t insertSize)
 Sets the inferred insert size (ISIZE)/observed template length (TLEN). More...
 
bool setSequence (const char *seq)
 Sets the sequence (SEQ) to the specified SAM formatted sequence string. More...
 
bool setQuality (const char *quality)
 Sets the quality (QUAL) to the specified SAM formatted quality string. More...
 
bool shiftIndelsLeft ()
 Shift the indels (if any) to the left by updating the CIGAR. More...
 
SamStatus::Status setBuffer (const char *fromBuffer, uint32_t fromBufferSize, SamFileHeader &header)
 Sets the SamRecord to contain the information in the BAM formatted fromBuffer. More...
 
SamStatus::Status setBufferFromFile (IFILE filePtr, SamFileHeader &header)
 Read the BAM record from a file. More...
 
Set Tag Data

Set methods for tags.

bool addIntTag (const char *tag, int32_t value)
 Add the specified integer tag to the record. More...
 
bool addTag (const char *tag, char vtype, const char *value)
 Add the specified tag,vtype,value to the record. More...
 
void clearTags ()
 Clear the tags in this record. More...
 
bool rmTag (const char *tag, char type)
 Remove a tag. More...
 
bool rmTags (const char *tags)
 Remove tags. More...
 
Get Alignment Data

Get methods for record fields.

All of the "get" methods set the status to indicate success or the failure reason.

const void * getRecordBuffer ()
 Get a const pointer to the buffer that contains the BAM representation of the record. More...
 
const void * getRecordBuffer (SequenceTranslation translation)
 Get a const pointer to the buffer that contains the BAM representation of the record using the specified translation on the sequence. More...
 
SamStatus::Status writeRecordBuffer (IFILE filePtr)
 Write the record as a BAM into the specified already opened file. More...
 
SamStatus::Status writeRecordBuffer (IFILE filePtr, SequenceTranslation translation)
 Write the record as a BAM into the specified already opened file using the specified translation on the sequence. More...
 
int32_t getBlockSize ()
 Get the block size of the record (BAM format). More...
 
const char * getReferenceName ()
 Get the reference sequence name (RNAME) of the record. More...
 
int32_t getReferenceID ()
 Get the reference sequence id of the record (BAM format rid). More...
 
int32_t get1BasedPosition ()
 Get the 1-based(SAM) leftmost position (POS) of the record. More...
 
int32_t get0BasedPosition ()
 Get the 0-based(BAM) leftmost position of the record. More...
 
uint8_t getReadNameLength ()
 Get the length of the readname (QNAME) including the null. More...
 
uint8_t getMapQuality ()
 Get the mapping quality (MAPQ) of the record. More...
 
uint16_t getBin ()
 Get the BAM bin for the record. More...
 
uint16_t getCigarLength ()
 Get the length of the BAM formatted CIGAR. More...
 
uint16_t getFlag ()
 Get the flag (FLAG). More...
 
int32_t getReadLength ()
 Get the length of the read. More...
 
const char * getMateReferenceName ()
 Get the mate/next fragment's reference sequence name (RNEXT). More...
 
const char * getMateReferenceNameOrEqual ()
 Get the mate/next fragment's reference sequence name (RNEXT), returning "=" if it is the same as the reference name, unless they are both "*" in which case "*" is returned. More...
 
int32_t getMateReferenceID ()
 Get the mate reference id of the record (BAM format: mate_rid/next_refID). More...
 
int32_t get1BasedMatePosition ()
 Get the 1-based(SAM) leftmost mate/next fragment's position (PNEXT). More...
 
int32_t get0BasedMatePosition ()
 Get the 0-based(BAM) leftmost mate/next fragment's position. More...
 
int32_t getInsertSize ()
 Get the inferred insert size of the read pair (ISIZE) or observed template length (TLEN). More...
 
int32_t get0BasedAlignmentEnd ()
 Returns the 0-based inclusive rightmost position of the clipped sequence. More...
 
int32_t get1BasedAlignmentEnd ()
 Returns the 1-based inclusive rightmost position of the clipped sequence. More...
 
int32_t getAlignmentLength ()
 Returns the length of the clipped sequence, returning 0 if the cigar is '*'. More...
 
int32_t get0BasedUnclippedStart ()
 Returns the 0-based inclusive left-most position adjusted for clipped bases. More...
 
int32_t get1BasedUnclippedStart ()
 Returns the 1-based inclusive left-most position adjusted for clipped bases. More...
 
int32_t get0BasedUnclippedEnd ()
 Returns the 0-based inclusive right-most position adjusted for clipped bases. More...
 
int32_t get1BasedUnclippedEnd ()
 Returns the 1-based inclusive right-most position adjusted for clipped bases. More...
 
const char * getReadName ()
 Returns the SAM formatted Read Name (QNAME). More...
 
const char * getCigar ()
 Returns the SAM formatted CIGAR string. More...
 
const char * getSequence ()
 Returns the SAM formatted sequence string (SEQ), translating the base as specified by setSequenceTranslation. More...
 
const char * getSequence (SequenceTranslation translation)
 Returns the SAM formatted sequence string (SEQ) performing the specified sequence translation. More...
 
const char * getQuality ()
 Returns the SAM formatted quality string (QUAL). More...
 
char getSequence (int index)
 Get the sequence base at the specified index into this sequence 0 to readLength - 1, translating the base as specified by setSequenceTranslation. More...
 
char getSequence (int index, SequenceTranslation translation)
 Get the sequence base at the specified index into this sequence 0 to readLength - 1 performing the specified sequence translation. More...
 
char getQuality (int index)
 Get the quality character at the specified index into the quality 0 to readLength - 1. More...
 
CigargetCigarInfo ()
 Returns a pointer to the Cigar object associated with this record. More...
 
uint32_t getNumOverlaps (int32_t start, int32_t end)
 Return the number of bases in this read that overlap the passed in region. More...
 
bool getFields (bamRecordStruct &recStruct, String &readName, String &cigar, String &sequence, String &quality)
 Returns the values of all fields except the tags. More...
 
bool getFields (bamRecordStruct &recStruct, String &readName, String &cigar, String &sequence, String &quality, SequenceTranslation translation)
 Returns the values of all fields except the tags using the specified sequence translation. More...
 
GenomeSequencegetReference ()
 Returns a pointer to the genome sequence object associated with this record if it was set (NULL if it was not set). More...
 

Get Tag Methods

Get methods for obtaining information on tags.

uint32_t getTagLength ()
 Returns the length of the BAM formatted tags. More...
 
bool getNextSamTag (char *tag, char &vtype, void **value)
 Get the next tag from the record. More...
 
void resetTagIter ()
 Reset the tag iterator to the beginning of the tags.
 
bool getTagsString (const char *tags, String &returnString, char delim='\t')
 Get the string representation of the tags from the record, formatted as TAG:TYPE:VALUE<delim>TAG:TYPE:VALUE... More...
 
const StringgetStringTag (const char *tag)
 Get the string value for the specified tag. More...
 
int * getIntegerTag (const char *tag)
 Get the integer value for the specified tag, DEPRECATED, use one that returns a bool (success/failure). More...
 
bool getIntegerTag (const char *tag, int &tagVal)
 Get the integer value for the specified tag. More...
 
bool getFloatTag (const char *tag, float &tagVal)
 Get the float value for the specified tag. More...
 
const StringgetString (const char *tag)
 Get the string value for the specified tag.
 
int & getInteger (const char *tag)
 Get the integer value for the specified tag, DEPRECATED, use getIntegerTag that returns a bool.
 
bool checkString (const char *tag)
 Check if the specified tag contains a string. More...
 
bool checkInteger (const char *tag)
 Check if the specified tag contains an integer. More...
 
bool checkFloat (const char *tag)
 Check if the specified tag contains a string. More...
 
bool checkTag (const char *tag, char type)
 Check if the specified tag contains a value of the specified vtype. More...
 
const SamStatusgetStatus ()
 Returns the status associated with the last method that sets the status. More...
 
static bool isIntegerType (char vtype)
 Returns whether or not the specified vtype is an integer type. More...
 
static bool isFloatType (char vtype)
 Returns whether or not the specified vtype is a float type. More...
 
static bool isCharType (char vtype)
 Returns whether or not the specified vtype is a char type. More...
 
static bool isStringType (char vtype)
 Returns whether or not the specified vtype is a string type. More...
 

Detailed Description

Class providing an easy to use interface to get/set/operate on the fields in a SAM/BAM record.


Definition at line 51 of file SamRecord.h.

Member Enumeration Documentation

◆ SequenceTranslation

Enum containing the settings on how to translate the sequence if a reference is available.

If no reference is available, no translation is done.

Enumerator
NONE 

Leave the sequence as is.

EQUAL 

Translate bases that match the reference to '='.

BASES 

Translate '=' to the actual base.

Definition at line 57 of file SamRecord.h.

57  {
58  NONE, ///< Leave the sequence as is.
59  EQUAL, ///< Translate bases that match the reference to '='
60  BASES, ///< Translate '=' to the actual base.
61  };

Constructor & Destructor Documentation

◆ SamRecord()

SamRecord::SamRecord ( ErrorHandler::HandlingType  errorHandlingType)

Constructor that sets the error handling type.

Parameters
errorHandlingTypehow to handle errors.

Definition at line 53 of file SamRecord.cpp.

54  : myStatus(errorHandlingType),
55  myRefPtr(NULL),
56  mySequenceTranslation(NONE)
57 {
58  int32_t defaultAllocSize = DEFAULT_BLOCK_SIZE + sizeof(int32_t);
59 
60  myRecordPtr =
61  (bamRecordStruct *) malloc(defaultAllocSize);
62 
63  myCigarTempBuffer = NULL;
64  myCigarTempBufferAllocatedSize = 0;
65 
66  allocatedSize = defaultAllocSize;
67 
68  resetRecord();
69 }

References resetRecord().

Member Function Documentation

◆ addIntTag()

bool SamRecord::addIntTag ( const char *  tag,
int32_t  value 
)

Add the specified integer tag to the record.

Internal processing handles switching between SAM/BAM formats when read/written and determining the type for BAM format. If the tag is already there this code will replace it if the specified value is different.

Parameters
tagtwo character tag to be added to the SAM/BAM record.
valuevalue for the specified tag.
Returns
true if the tag was successfully added, false otherwise.

Definition at line 635 of file SamRecord.cpp.

636 {
637  myStatus = SamStatus::SUCCESS;
638  int key = 0;
639  int index = 0;
640  char bamvtype;
641 
642  int tagBufferSize = 0;
643 
644  // First check to see if the tags need to be synced to the buffer.
645  if(myNeedToSetTagsFromBuffer)
646  {
647  if(!setTagsFromBuffer())
648  {
649  // Failed to read tags from the buffer, so cannot add new ones.
650  return(false);
651  }
652  }
653 
654  // Ints come in as int. But it can be represented in fewer bits.
655  // So determine a more specific type that is in line with the
656  // types for BAM files.
657  // First check to see if it is a negative.
658  if(value < 0)
659  {
660  // The int is negative, so it will need to use a signed type.
661  // See if it is greater than the min value for a char.
662  if(value > ((std::numeric_limits<char>::min)()))
663  {
664  // It can be stored in a signed char.
665  bamvtype = 'c';
666  tagBufferSize += 4;
667  }
668  else if(value > ((std::numeric_limits<short>::min)()))
669  {
670  // It fits in a signed short.
671  bamvtype = 's';
672  tagBufferSize += 5;
673  }
674  else
675  {
676  // Just store it as a signed int.
677  bamvtype = 'i';
678  tagBufferSize += 7;
679  }
680  }
681  else
682  {
683  // It is positive, so an unsigned type can be used.
684  if(value < ((std::numeric_limits<unsigned char>::max)()))
685  {
686  // It is under the max of an unsigned char.
687  bamvtype = 'C';
688  tagBufferSize += 4;
689  }
690  else if(value < ((std::numeric_limits<unsigned short>::max)()))
691  {
692  // It is under the max of an unsigned short.
693  bamvtype = 'S';
694  tagBufferSize += 5;
695  }
696  else
697  {
698  // Just store it as an unsigned int.
699  bamvtype = 'I';
700  tagBufferSize += 7;
701  }
702  }
703 
704  // Check to see if the tag is already there.
705  key = MAKEKEY(tag[0], tag[1], bamvtype);
706  unsigned int hashIndex = extras.Find(key);
707  if(hashIndex != LH_NOTFOUND)
708  {
709  // Tag was already found.
710  index = extras[hashIndex];
711 
712  // Since the tagBufferSize was already updated with the new value,
713  // subtract the size for the previous tag (even if they are the same).
714  switch(intType[index])
715  {
716  case 'c':
717  case 'C':
718  case 'A':
719  tagBufferSize -= 4;
720  break;
721  case 's':
722  case 'S':
723  tagBufferSize -= 5;
724  break;
725  case 'i':
726  case 'I':
727  tagBufferSize -= 7;
728  break;
729  default:
730  myStatus.setStatus(SamStatus::INVALID,
731  "unknown tag inttype type found.\n");
732  return(false);
733  }
734 
735  // Tag already existed, print message about overwriting.
736  // WARN about dropping duplicate tags.
737  if(myNumWarns++ < myMaxWarns)
738  {
739  String newVal;
740  String origVal;
741  appendIntArrayValue(index, origVal);
742  appendIntArrayValue(bamvtype, value, newVal);
743  fprintf(stderr, "WARNING: Duplicate Tags, overwritting %c%c:%c:%s with %c%c:%c:%s\n",
744  tag[0], tag[1], intType[index], origVal.c_str(), tag[0], tag[1], bamvtype, newVal.c_str());
745  if(myNumWarns == myMaxWarns)
746  {
747  fprintf(stderr, "Suppressing rest of Duplicate Tag warnings.\n");
748  }
749  }
750 
751  // Update the integer value and type.
752  integers[index] = value;
753  intType[index] = bamvtype;
754  }
755  else
756  {
757  // Tag is not already there, so add it.
758  index = integers.Length();
759 
760  integers.Push(value);
761  intType.push_back(bamvtype);
762 
763  extras.Add(key, index);
764  }
765 
766  // The buffer tags are now out of sync.
767  myNeedToSetTagsInBuffer = true;
768  myIsTagsBufferValid = false;
769  myIsBufferSynced = false;
770  myTagBufferSize += tagBufferSize;
771 
772  return(true);
773 }

References StatGenStatus::INVALID, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

Referenced by addTag().

◆ addTag()

bool SamRecord::addTag ( const char *  tag,
char  vtype,
const char *  value 
)

Add the specified tag,vtype,value to the record.

Vtype can be SAM/BAM format. Internal processing handles switching between SAM/BAM formats when read/written. If the tag is already there this code will replace it if the specified value is different.

Parameters
tagtwo character tag to be added to the SAM/BAM record.
vtypevtype of the specified value - either SAM/BAM vtypes.
valuevalue as a string for the specified tag.
Returns
true if the tag was successfully added, false otherwise.

Definition at line 779 of file SamRecord.cpp.

780 {
781  if(vtype == 'i')
782  {
783  // integer type. Call addIntTag to handle it.
784  int intVal = atoi(valuePtr);
785  return(addIntTag(tag, intVal));
786  }
787 
788  // Non-int type.
789  myStatus = SamStatus::SUCCESS;
790  bool status = true; // default to successful.
791  int key = 0;
792  int index = 0;
793 
794  int tagBufferSize = 0;
795 
796  // First check to see if the tags need to be synced to the buffer.
797  if(myNeedToSetTagsFromBuffer)
798  {
799  if(!setTagsFromBuffer())
800  {
801  // Failed to read tags from the buffer, so cannot add new ones.
802  return(false);
803  }
804  }
805 
806  // First check to see if the tag is already there.
807  key = MAKEKEY(tag[0], tag[1], vtype);
808  unsigned int hashIndex = extras.Find(key);
809  if(hashIndex != LH_NOTFOUND)
810  {
811  // The key was found in the hash, so get the lookup index.
812  index = extras[hashIndex];
813 
814  String origTag;
815  char origType = vtype;
816 
817  // Adjust the currently pointed to value to the new setting.
818  switch (vtype)
819  {
820  case 'A' :
821  // First check to see if the value changed.
822  if((integers[index] == (const int)*(valuePtr)) &&
823  (intType[index] == vtype))
824  {
825  // The value & type has not changed, so do nothing.
826  return(true);
827  }
828  else
829  {
830  // Tag buffer size changes if type changes, so subtract & add.
831  origType = intType[index];
832  appendIntArrayValue(index, origTag);
833  tagBufferSize -= getNumericTagTypeSize(intType[index]);
834  tagBufferSize += getNumericTagTypeSize(vtype);
835  integers[index] = (const int)*(valuePtr);
836  intType[index] = vtype;
837  }
838  break;
839  case 'Z' :
840  // First check to see if the value changed.
841  if(strings[index] == valuePtr)
842  {
843  // The value has not changed, so do nothing.
844  return(true);
845  }
846  else
847  {
848  // Adjust the tagBufferSize by removing the size of the old string.
849  origTag = strings[index];
850  tagBufferSize -= strings[index].Length();
851  strings[index] = valuePtr;
852  // Adjust the tagBufferSize by adding the size of the new string.
853  tagBufferSize += strings[index].Length();
854  }
855  break;
856  case 'B' :
857  // First check to see if the value changed.
858  if(strings[index] == valuePtr)
859  {
860  // The value has not changed, so do nothing.
861  return(true);
862  }
863  else
864  {
865  // Adjust the tagBufferSize by removing the size of the old field.
866  origTag = strings[index];
867  tagBufferSize -= getBtagBufferSize(strings[index]);
868  strings[index] = valuePtr;
869  // Adjust the tagBufferSize by adding the size of the new field.
870  tagBufferSize += getBtagBufferSize(strings[index]);
871  }
872  break;
873  case 'f' :
874  // First check to see if the value changed.
875  if(floats[index] == (float)atof(valuePtr))
876  {
877  // The value has not changed, so do nothing.
878  return(true);
879  }
880  else
881  {
882  // Tag buffer size doesn't change between different 'f' entries.
883  origTag.appendFullFloat(floats[index]);
884  floats[index] = (float)atof(valuePtr);
885  }
886  break;
887  default :
888  fprintf(stderr,
889  "samRecord::addTag() - Unknown custom field of type %c\n",
890  vtype);
892  "Unknown custom field in a tag");
893  status = false;
894  break;
895  }
896 
897  // Duplicate tag in this record.
898  // Tag already existed, print message about overwriting.
899  // WARN about dropping duplicate tags.
900  if(myNumWarns++ < myMaxWarns)
901  {
902  fprintf(stderr, "WARNING: Duplicate Tags, overwritting %c%c:%c:%s with %c%c:%c:%s\n",
903  tag[0], tag[1], origType, origTag.c_str(), tag[0], tag[1], vtype, valuePtr);
904  if(myNumWarns == myMaxWarns)
905  {
906  fprintf(stderr, "Suppressing rest of Duplicate Tag warnings.\n");
907  }
908  }
909  }
910  else
911  {
912  // The key was not found in the hash, so add it.
913  switch (vtype)
914  {
915  case 'A' :
916  index = integers.Length();
917  integers.Push((const int)*(valuePtr));
918  intType.push_back(vtype);
919  tagBufferSize += 4;
920  break;
921  case 'Z' :
922  index = strings.Length();
923  strings.Push(valuePtr);
924  tagBufferSize += 4 + strings.Last().Length();
925  break;
926  case 'B' :
927  index = strings.Length();
928  strings.Push(valuePtr);
929  tagBufferSize += 3 + getBtagBufferSize(strings[index]);
930  break;
931  case 'f' :
932  index = floats.size();
933  floats.push_back((float)atof(valuePtr));
934  tagBufferSize += 7;
935  break;
936  default :
937  fprintf(stderr,
938  "samRecord::addTag() - Unknown custom field of type %c\n",
939  vtype);
941  "Unknown custom field in a tag");
942  status = false;
943  break;
944  }
945  if(status)
946  {
947  // If successful, add the key to extras.
948  extras.Add(key, index);
949  }
950  }
951 
952  // Only add the tag if it has so far been successfully processed.
953  if(status)
954  {
955  // The buffer tags are now out of sync.
956  myNeedToSetTagsInBuffer = true;
957  myIsTagsBufferValid = false;
958  myIsBufferSynced = false;
959  myTagBufferSize += tagBufferSize;
960  }
961  return(status);
962 }

References addIntTag(), StatGenStatus::FAIL_PARSE, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ checkFloat()

bool SamRecord::checkFloat ( const char *  tag)
inline

Check if the specified tag contains a string.

Does not set SamStatus.

Parameters
tagSAM tag to check contents of.
Returns
true if the value associated with the tag is a string.

Definition at line 613 of file SamRecord.h.

613 { return checkTag(tag, 'f'); }

References checkTag().

◆ checkInteger()

bool SamRecord::checkInteger ( const char *  tag)
inline

Check if the specified tag contains an integer.

Does not set SamStatus.

Parameters
tagSAM tag to check contents of.
Returns
true if the value associated with the tag is a string.

Definition at line 607 of file SamRecord.h.

607 { return checkTag(tag, 'i'); }

References checkTag().

◆ checkString()

bool SamRecord::checkString ( const char *  tag)
inline

Check if the specified tag contains a string.

Does not set SamStatus.

Parameters
tagSAM tag to check contents of.
Returns
true if the value associated with the tag is a string.

Definition at line 600 of file SamRecord.h.

601  { return(checkTag(tag, 'Z') || checkTag(tag, 'B')); }

References checkTag().

◆ checkTag()

bool SamRecord::checkTag ( const char *  tag,
char  type 
)

Check if the specified tag contains a value of the specified vtype.

Does not set SamStatus.

Parameters
tagSAM tag to check contents of.
typevalue type to check if the SAM tag matches.
Returns
true if the value associated with the tag is a string.

Definition at line 2369 of file SamRecord.cpp.

2370 {
2371  // Init to success.
2372  myStatus = SamStatus::SUCCESS;
2373  // Parse the buffer if necessary.
2374  if(myNeedToSetTagsFromBuffer)
2375  {
2376  if(!setTagsFromBuffer())
2377  {
2378  // Failed to read the tags from the buffer, so cannot
2379  // get tags. setTagsFromBuffer set the error.
2380  return("");
2381  }
2382  }
2383 
2384  int key = MAKEKEY(tag[0], tag[1], type);
2385 
2386  return (extras.Find(key) != LH_NOTFOUND);
2387 }

References StatGenStatus::SUCCESS.

Referenced by checkFloat(), checkInteger(), and checkString().

◆ clearTags()

void SamRecord::clearTags ( )

Clear the tags in this record.

Does not set SamStatus.

Definition at line 965 of file SamRecord.cpp.

966 {
967  if(extras.Entries() != 0)
968  {
969  extras.Clear();
970  }
971  strings.Clear();
972  integers.Clear();
973  intType.clear();
974  floats.clear();
975  myTagBufferSize = 0;
976  resetTagIter();
977 }

References resetTagIter().

Referenced by resetRecord().

◆ get0BasedAlignmentEnd()

int32_t SamRecord::get0BasedAlignmentEnd ( )

Returns the 0-based inclusive rightmost position of the clipped sequence.

Returns
0-based inclusive rightmost position

Definition at line 1455 of file SamRecord.cpp.

1456 {
1457  myStatus = SamStatus::SUCCESS;
1458  if(myAlignmentLength == -1)
1459  {
1460  // Alignment end has not been set, so calculate it.
1461  parseCigar();
1462  }
1463  // If alignment length > 0, subtract 1 from it to get the end.
1464  if(myAlignmentLength == 0)
1465  {
1466  // Length is 0, just return the start position.
1467  return(myRecordPtr->myPosition);
1468  }
1469  return(myRecordPtr->myPosition + myAlignmentLength - 1);
1470 }

References StatGenStatus::SUCCESS.

Referenced by get0BasedUnclippedEnd(), get1BasedAlignmentEnd(), Pileup< TestPileupElement >::processAlignment(), Pileup< TestPileupElement >::processAlignmentRegion(), and CigarHelper::softClipEndByRefPos().

◆ get0BasedMatePosition()

int32_t SamRecord::get0BasedMatePosition ( )

Get the 0-based(BAM) leftmost mate/next fragment's position.

Returns
0-based leftmost position.

Definition at line 1440 of file SamRecord.cpp.

1441 {
1442  myStatus = SamStatus::SUCCESS;
1443  return myRecordPtr->myMatePosition;
1444 }

References StatGenStatus::SUCCESS.

◆ get0BasedPosition()

int32_t SamRecord::get0BasedPosition ( )

◆ get0BasedUnclippedEnd()

int32_t SamRecord::get0BasedUnclippedEnd ( )

Returns the 0-based inclusive right-most position adjusted for clipped bases.

Returns
0-based inclusive rightmost position including clips.

Definition at line 1514 of file SamRecord.cpp.

1515 {
1516  // myUnclippedEndOffset will be set by get0BasedAlignmentEnd if the
1517  // cigar has not yet been parsed, so no need to check it here.
1518  return(get0BasedAlignmentEnd() + myUnclippedEndOffset);
1519 }

References get0BasedAlignmentEnd().

Referenced by get1BasedUnclippedEnd().

◆ get0BasedUnclippedStart()

int32_t SamRecord::get0BasedUnclippedStart ( )

Returns the 0-based inclusive left-most position adjusted for clipped bases.

Returns
0-based inclusive leftmost position including clips.

Definition at line 1494 of file SamRecord.cpp.

1495 {
1496  myStatus = SamStatus::SUCCESS;
1497  if(myUnclippedStartOffset == -1)
1498  {
1499  // Unclipped has not yet been calculated, so parse the cigar to get it
1500  parseCigar();
1501  }
1502  return(myRecordPtr->myPosition - myUnclippedStartOffset);
1503 }

References StatGenStatus::SUCCESS.

Referenced by get1BasedUnclippedStart().

◆ get1BasedAlignmentEnd()

int32_t SamRecord::get1BasedAlignmentEnd ( )

Returns the 1-based inclusive rightmost position of the clipped sequence.

Returns
1-based inclusive rightmost position

Definition at line 1474 of file SamRecord.cpp.

1475 {
1476  return(get0BasedAlignmentEnd() + 1);
1477 }

References get0BasedAlignmentEnd().

Referenced by getBin().

◆ get1BasedMatePosition()

int32_t SamRecord::get1BasedMatePosition ( )

Get the 1-based(SAM) leftmost mate/next fragment's position (PNEXT).

Returns
1-based leftmost position.

Definition at line 1433 of file SamRecord.cpp.

1434 {
1435  myStatus = SamStatus::SUCCESS;
1436  return (myRecordPtr->myMatePosition + 1);
1437 }

References StatGenStatus::SUCCESS.

◆ get1BasedPosition()

int32_t SamRecord::get1BasedPosition ( )

Get the 1-based(SAM) leftmost position (POS) of the record.

Returns
1-based leftmost position.

Definition at line 1300 of file SamRecord.cpp.

1301 {
1302  myStatus = SamStatus::SUCCESS;
1303  return (myRecordPtr->myPosition + 1);
1304 }

References StatGenStatus::SUCCESS.

Referenced by SamValidator::isValid().

◆ get1BasedUnclippedEnd()

int32_t SamRecord::get1BasedUnclippedEnd ( )

Returns the 1-based inclusive right-most position adjusted for clipped bases.

Returns
1-based inclusive rightmost position including clips.

Definition at line 1523 of file SamRecord.cpp.

1524 {
1525  return(get0BasedUnclippedEnd() + 1);
1526 }

References get0BasedUnclippedEnd().

◆ get1BasedUnclippedStart()

int32_t SamRecord::get1BasedUnclippedStart ( )

Returns the 1-based inclusive left-most position adjusted for clipped bases.

Returns
1-based inclusive leftmost position including clips.

Definition at line 1507 of file SamRecord.cpp.

1508 {
1509  return(get0BasedUnclippedStart() + 1);
1510 }

References get0BasedUnclippedStart().

◆ getAlignmentLength()

int32_t SamRecord::getAlignmentLength ( )

Returns the length of the clipped sequence, returning 0 if the cigar is '*'.

Returns
length of the clipped sequence.

Definition at line 1481 of file SamRecord.cpp.

1482 {
1483  myStatus = SamStatus::SUCCESS;
1484  if(myAlignmentLength == -1)
1485  {
1486  // Alignment end has not been set, so calculate it.
1487  parseCigar();
1488  }
1489  // Return the alignment length.
1490  return(myAlignmentLength);
1491 }

References StatGenStatus::SUCCESS.

◆ getBin()

uint16_t SamRecord::getBin ( )

Get the BAM bin for the record.

Returns
BAM bin

Definition at line 1335 of file SamRecord.cpp.

1336 {
1337  myStatus = SamStatus::SUCCESS;
1338  if(!myIsBinValid)
1339  {
1340  // The bin that is set in the record is not valid, so
1341  // reset it.
1342  myRecordPtr->myBin =
1343  bam_reg2bin(myRecordPtr->myPosition, get1BasedAlignmentEnd());
1344  myIsBinValid = true;
1345  }
1346  return(myRecordPtr->myBin);
1347 }

References get1BasedAlignmentEnd(), and StatGenStatus::SUCCESS.

◆ getBlockSize()

int32_t SamRecord::getBlockSize ( )

Get the block size of the record (BAM format).

Returns
BAM block size of the record.

Definition at line 1269 of file SamRecord.cpp.

1270 {
1271  myStatus = SamStatus::SUCCESS;
1272  // If the buffer isn't synced, sync the buffer to determine the
1273  // block size.
1274  if(myIsBufferSynced == false)
1275  {
1276  // Since this just returns the block size, the translation of
1277  // the sequence does not matter, so just use the currently set
1278  // value.
1279  fixBuffer(myBufferSequenceTranslation);
1280  }
1281  return myRecordPtr->myBlockSize;
1282 }

References StatGenStatus::SUCCESS.

◆ getCigar()

const char * SamRecord::getCigar ( )

Returns the SAM formatted CIGAR string.

Returns
cigar string.

Definition at line 1543 of file SamRecord.cpp.

1544 {
1545  myStatus = SamStatus::SUCCESS;
1546  if(myCigar.Length() == 0)
1547  {
1548  // 0 Length, means that it is in the buffer, but has not yet
1549  // been synced to the string, so do the sync.
1550  parseCigarBinary();
1551  }
1552  return myCigar.c_str();
1553 }

References StatGenStatus::SUCCESS.

Referenced by getFields(), SamValidator::isValidCigar(), CigarHelper::softClipBeginByRefPos(), and CigarHelper::softClipEndByRefPos().

◆ getCigarInfo()

Cigar * SamRecord::getCigarInfo ( )

Returns a pointer to the Cigar object associated with this record.


The object is essentially read-only, only allowing modifications due to lazy evaluations.

Returns
pointer to the Cigar object.

Definition at line 1824 of file SamRecord.cpp.

1825 {
1826  // Check to see whether or not the Cigar has already been
1827  // set - this is determined by checking if alignment length
1828  // is set since alignment length and the cigar are set
1829  // at the same time.
1830  if(myAlignmentLength == -1)
1831  {
1832  // Not been set, so calculate it.
1833  parseCigar();
1834  }
1835  return(&myCigarRoller);
1836 }

Referenced by PileupElementBaseQual::addEntry(), SamRecordHelper::checkSequence(), SamTags::createMDTag(), getSequence(), SamQuerySeqWithRefIter::reset(), SamFilter::softClip(), CigarHelper::softClipBeginByRefPos(), and CigarHelper::softClipEndByRefPos().

◆ getCigarLength()

uint16_t SamRecord::getCigarLength ( )

Get the length of the BAM formatted CIGAR.

Returns
length of BAM formatted cigar.

Definition at line 1350 of file SamRecord.cpp.

1351 {
1352  myStatus = SamStatus::SUCCESS;
1353  // If the cigar buffer is valid
1354  // then get the length from there.
1355  if(myIsCigarBufferValid)
1356  {
1357  return myRecordPtr->myCigarLength;
1358  }
1359 
1360  if(myCigarTempBufferLength == -1)
1361  {
1362  // The cigar buffer is not valid and the cigar temp buffer is not set,
1363  // so parse the string.
1364  parseCigarString();
1365  }
1366 
1367  // The temp buffer is now set, so return the size.
1368  return(myCigarTempBufferLength);
1369 }

References StatGenStatus::SUCCESS.

◆ getFields() [1/2]

bool SamRecord::getFields ( bamRecordStruct recStruct,
String readName,
String cigar,
String sequence,
String quality 
)

Returns the values of all fields except the tags.

Parameters
recStructstructure containing the contents of all non-variable length fields.
readNameread name from the record (return param)
cigarcigar string from the record (return param)
sequencesequence string from the record (return param)
qualityquality string from the record (return param)
Returns
true if all fields were successfully set, false otherwise.

Definition at line 1854 of file SamRecord.cpp.

1856 {
1857  return(getFields(recStruct, readName, cigar, sequence, quality,
1858  mySequenceTranslation));
1859 }

◆ getFields() [2/2]

bool SamRecord::getFields ( bamRecordStruct recStruct,
String readName,
String cigar,
String sequence,
String quality,
SequenceTranslation  translation 
)

Returns the values of all fields except the tags using the specified sequence translation.

Parameters
recStructstructure containing the contents of all non-variable length fields.
readNameread name from the record (return param)
cigarcigar string from the record (return param)
sequencesequence string from the record (return param)
qualityquality string from the record (return param)
translationtype of sequence translation to use.
Returns
true if all fields were successfully set, false otherwise.

Definition at line 1863 of file SamRecord.cpp.

1866 {
1867  myStatus = SamStatus::SUCCESS;
1868  if(myIsBufferSynced == false)
1869  {
1870  if(!fixBuffer(translation))
1871  {
1872  // failed to set the buffer, return false.
1873  return(false);
1874  }
1875  }
1876  memcpy(&recStruct, myRecordPtr, sizeof(bamRecordStruct));
1877 
1878  readName = getReadName();
1879  // Check the status.
1880  if(myStatus != SamStatus::SUCCESS)
1881  {
1882  // Failed to set the fields, return false.
1883  return(false);
1884  }
1885  cigar = getCigar();
1886  // Check the status.
1887  if(myStatus != SamStatus::SUCCESS)
1888  {
1889  // Failed to set the fields, return false.
1890  return(false);
1891  }
1892  sequence = getSequence(translation);
1893  // Check the status.
1894  if(myStatus != SamStatus::SUCCESS)
1895  {
1896  // Failed to set the fields, return false.
1897  return(false);
1898  }
1899  quality = getQuality();
1900  // Check the status.
1901  if(myStatus != SamStatus::SUCCESS)
1902  {
1903  // Failed to set the fields, return false.
1904  return(false);
1905  }
1906  return(true);
1907 }

References getCigar(), getQuality(), getReadName(), getSequence(), and StatGenStatus::SUCCESS.

◆ getFlag()

uint16_t SamRecord::getFlag ( )

Get the flag (FLAG).

Returns
flag.

Definition at line 1372 of file SamRecord.cpp.

1373 {
1374  myStatus = SamStatus::SUCCESS;
1375  return myRecordPtr->myFlag;
1376 }

References StatGenStatus::SUCCESS.

Referenced by SamFilter::filterRead(), SamQuerySeqWithRefIter::getNextMatchMismatch(), SamValidator::isValid(), Pileup< TestPileupElement >::processFile(), and SamFile::ReadRecord().

◆ getFloatTag()

bool SamRecord::getFloatTag ( const char *  tag,
float &  tagVal 
)

Get the float value for the specified tag.

Parameters
tagtag to retrieve
tagValreturn parameter with integer value for the tag
Returns
bool true if Float tag was found and tagVal was set, false if not.

Definition at line 2269 of file SamRecord.cpp.

2270 {
2271  // Init to success.
2272  myStatus = SamStatus::SUCCESS;
2273  // Parse the buffer if necessary.
2274  if(myNeedToSetTagsFromBuffer)
2275  {
2276  if(!setTagsFromBuffer())
2277  {
2278  // Failed to read the tags from the buffer, so cannot
2279  // get tags. setTagsFromBuffer set the errors,
2280  // so just return false.
2281  return(false);
2282  }
2283  }
2284 
2285  int key = MAKEKEY(tag[0], tag[1], 'f');
2286  int offset = extras.Find(key);
2287 
2288  int value;
2289  if (offset < 0)
2290  {
2291  // Failed to find the tag.
2292  return(false);
2293  }
2294  else
2295  value = extras[offset];
2296 
2297  tagVal = floats[value];
2298  return(true);
2299 }

References StatGenStatus::SUCCESS.

◆ getInsertSize()

int32_t SamRecord::getInsertSize ( )

Get the inferred insert size of the read pair (ISIZE) or observed template length (TLEN).

Returns
inferred insert size or observed template length.

Definition at line 1447 of file SamRecord.cpp.

1448 {
1449  myStatus = SamStatus::SUCCESS;
1450  return myRecordPtr->myInsertSize;
1451 }

References StatGenStatus::SUCCESS.

◆ getIntegerTag() [1/2]

int * SamRecord::getIntegerTag ( const char *  tag)

Get the integer value for the specified tag, DEPRECATED, use one that returns a bool (success/failure).

Parameters
tagtag to retrieve \retun pointer to the tag's integer value if found, NULL if not found.

Definition at line 2204 of file SamRecord.cpp.

2205 {
2206  // Init to success.
2207  myStatus = SamStatus::SUCCESS;
2208  // Parse the buffer if necessary.
2209  if(myNeedToSetTagsFromBuffer)
2210  {
2211  if(!setTagsFromBuffer())
2212  {
2213  // Failed to read the tags from the buffer, so cannot
2214  // get tags. setTagsFromBuffer set the errors,
2215  // so just return NULL.
2216  return(NULL);
2217  }
2218  }
2219 
2220  int key = MAKEKEY(tag[0], tag[1], 'i');
2221  int offset = extras.Find(key);
2222 
2223  int value;
2224  if (offset < 0)
2225  {
2226  // Failed to find the tag.
2227  return(NULL);
2228  }
2229  else
2230  value = extras[offset];
2231 
2232  return(&(integers[value]));
2233 }

References StatGenStatus::SUCCESS.

◆ getIntegerTag() [2/2]

bool SamRecord::getIntegerTag ( const char *  tag,
int &  tagVal 
)

Get the integer value for the specified tag.

Parameters
tagtag to retrieve
tagValreturn parameter with integer value for the tag \retun bool true if Integer tag was found and tagVal was set, false if not.

Definition at line 2236 of file SamRecord.cpp.

2237 {
2238  // Init to success.
2239  myStatus = SamStatus::SUCCESS;
2240  // Parse the buffer if necessary.
2241  if(myNeedToSetTagsFromBuffer)
2242  {
2243  if(!setTagsFromBuffer())
2244  {
2245  // Failed to read the tags from the buffer, so cannot
2246  // get tags. setTagsFromBuffer set the errors,
2247  // so just return false.
2248  return(false);
2249  }
2250  }
2251 
2252  int key = MAKEKEY(tag[0], tag[1], 'i');
2253  int offset = extras.Find(key);
2254 
2255  int value;
2256  if (offset < 0)
2257  {
2258  // Failed to find the tag.
2259  return(false);
2260  }
2261  else
2262  value = extras[offset];
2263 
2264  tagVal = integers[value];
2265  return(true);
2266 }

References StatGenStatus::SUCCESS.

◆ getMapQuality()

uint8_t SamRecord::getMapQuality ( )

Get the mapping quality (MAPQ) of the record.

Returns
map quality.

Definition at line 1328 of file SamRecord.cpp.

1329 {
1330  myStatus = SamStatus::SUCCESS;
1331  return myRecordPtr->myMapQuality;
1332 }

References StatGenStatus::SUCCESS.

Referenced by SamValidator::isValid().

◆ getMateReferenceID()

int32_t SamRecord::getMateReferenceID ( )

Get the mate reference id of the record (BAM format: mate_rid/next_refID).

Returns
reference id

Definition at line 1426 of file SamRecord.cpp.

1427 {
1428  myStatus = SamStatus::SUCCESS;
1429  return myRecordPtr->myMateReferenceID;
1430 }

References StatGenStatus::SUCCESS.

◆ getMateReferenceName()

const char * SamRecord::getMateReferenceName ( )

Get the mate/next fragment's reference sequence name (RNEXT).

If it is equal to the reference name, it still returns the reference name.

Returns
reference sequence name

Definition at line 1398 of file SamRecord.cpp.

1399 {
1400  myStatus = SamStatus::SUCCESS;
1401  return myMateReferenceName.c_str();
1402 }

References StatGenStatus::SUCCESS.

◆ getMateReferenceNameOrEqual()

const char * SamRecord::getMateReferenceNameOrEqual ( )

Get the mate/next fragment's reference sequence name (RNEXT), returning "=" if it is the same as the reference name, unless they are both "*" in which case "*" is returned.

Returns
reference sequence name or '='

Definition at line 1408 of file SamRecord.cpp.

1409 {
1410  myStatus = SamStatus::SUCCESS;
1411  if(myMateReferenceName == "*")
1412  {
1413  return(myMateReferenceName);
1414  }
1415  if(myMateReferenceName == getReferenceName())
1416  {
1417  return(FIELD_ABSENT_STRING);
1418  }
1419  else
1420  {
1421  return(myMateReferenceName);
1422  }
1423 }

References getReferenceName(), and StatGenStatus::SUCCESS.

◆ getNextSamTag()

bool SamRecord::getNextSamTag ( char *  tag,
char &  vtype,
void **  value 
)

Get the next tag from the record.

Sets the Status to SUCCESS when a tag is successfully returned or when there are no more tags. Otherwise the status is set to describe why it failed (parsing, etc).

Parameters
tagset to the tag when a tag is read.
vtypeset to the vtype when a tag is read.
valuepointer to the value of the tag (will need to cast to int, float, char, or string based on vtype).
Returns
true if a tag was read, false if there are no more tags.

Definition at line 1950 of file SamRecord.cpp.

1951 {
1952  myStatus = SamStatus::SUCCESS;
1953  if(myNeedToSetTagsFromBuffer)
1954  {
1955  if(!setTagsFromBuffer())
1956  {
1957  // Failed to read the tags from the buffer, so cannot
1958  // get tags.
1959  return(false);
1960  }
1961  }
1962 
1963  // Increment the tag index to start looking at the next tag.
1964  // At the beginning, it is set to -1.
1965  myLastTagIndex++;
1966  int maxTagIndex = extras.Capacity();
1967  if(myLastTagIndex >= maxTagIndex)
1968  {
1969  // Hit the end of the tags, return false, no more tags.
1970  // Status is still success since this is not an error,
1971  // it is just the end of the list.
1972  return(false);
1973  }
1974 
1975  bool tagFound = false;
1976  // Loop until a tag is found or the end of extras is hit.
1977  while((tagFound == false) && (myLastTagIndex < maxTagIndex))
1978  {
1979  if(extras.SlotInUse(myLastTagIndex))
1980  {
1981  // Found a slot to use.
1982  int key = extras.GetKey(myLastTagIndex);
1983  getTag(key, tag);
1984  getTypeFromKey(key, vtype);
1985  tagFound = true;
1986  // Get the value associated with the key based on the vtype.
1987  switch (vtype)
1988  {
1989  case 'f' :
1990  *value = getFloatPtr(myLastTagIndex);
1991  break;
1992  case 'i' :
1993  *value = getIntegerPtr(myLastTagIndex, vtype);
1994  if(vtype != 'A')
1995  {
1996  // Convert all int types to 'i'
1997  vtype = 'i';
1998  }
1999  break;
2000  case 'Z' :
2001  case 'B' :
2002  *value = getStringPtr(myLastTagIndex);
2003  break;
2004  default:
2006  "Unknown tag type");
2007  tagFound = false;
2008  break;
2009  }
2010  }
2011  if(!tagFound)
2012  {
2013  // Increment the index since a tag was not found.
2014  myLastTagIndex++;
2015  }
2016  }
2017  return(tagFound);
2018 }

References StatGenStatus::FAIL_PARSE, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

Referenced by SamRecordHelper::genSamTagsString().

◆ getNumOverlaps()

uint32_t SamRecord::getNumOverlaps ( int32_t  start,
int32_t  end 
)

Return the number of bases in this read that overlap the passed in region.

Matches & mismatches between the read and the reference are counted as overlaps, but insertions, deletions, skips, clips, and pads are not counted.

Parameters
startinclusive 0-based start position (reference position) of the region to check for overlaps in. (-1 indicates to start at the beginning of the reference.)
endexclusive 0-based end position (reference position) of the region to check for overlaps in. (-1 indicates to go to the end of the reference.)
Returns
number of overlapping bases

Definition at line 1841 of file SamRecord.cpp.

1842 {
1843  // Determine whether or not the cigar has been parsed, which sets up
1844  // the cigar roller. This is determined by checking the alignment length.
1845  if(myAlignmentLength == -1)
1846  {
1847  parseCigar();
1848  }
1849  return(myCigarRoller.getNumOverlaps(start, end, get0BasedPosition()));
1850 }

References get0BasedPosition(), and Cigar::getNumOverlaps().

Referenced by SamFile::GetNumOverlaps().

◆ getQuality() [1/2]

const char * SamRecord::getQuality ( )

Returns the SAM formatted quality string (QUAL).

Returns
quality string.

Definition at line 1626 of file SamRecord.cpp.

1627 {
1628  myStatus = SamStatus::SUCCESS;
1629  if(myQuality.Length() == 0)
1630  {
1631  // 0 Length, means that it is in the buffer, but has not yet
1632  // been synced to the string, so do the sync.
1633  setSequenceAndQualityFromBuffer();
1634  }
1635  return myQuality.c_str();
1636 }

References StatGenStatus::SUCCESS.

Referenced by PileupElementBaseQual::addEntry(), getFields(), SamValidator::isValidQuality(), and SamFilter::sumMismatchQuality().

◆ getQuality() [2/2]

char SamRecord::getQuality ( int  index)

Get the quality character at the specified index into the quality 0 to readLength - 1.

Throws an exception if index is out of range.

Parameters
indexindex into the quality string (0 to readLength-1).
Returns
the quality character at the specified index into the quality.

Definition at line 1770 of file SamRecord.cpp.

1771 {
1772  // Determine the read length.
1773  int32_t readLen = getReadLength();
1774 
1775  // If the read length is 0, return ' ' whose ascii code is below
1776  // the minimum ascii code for qualities.
1777  if(readLen == 0)
1778  {
1780  }
1781  else if((index < 0) || (index >= readLen))
1782  {
1783  // Only get here if the index was out of range, so thow an exception.
1784  String exceptionString = "SamRecord::getQuality(";
1785  exceptionString += index;
1786  exceptionString += ") is out of range. Index must be between 0 and ";
1787  exceptionString += (readLen - 1);
1788  throw std::runtime_error(exceptionString.c_str());
1789  }
1790 
1791  if(myQuality.Length() == 0)
1792  {
1793  // Parse BAM Quality.
1794  // Know that myPackedQuality is correct since readLen != 0.
1795  return(myPackedQuality[index] + 33);
1796  }
1797  else
1798  {
1799  // Already have string.
1800  if((myQuality.Length() == 1) && (myQuality[0] == '*'))
1801  {
1802  // Return the unknown quality character.
1804  }
1805  else if(index >= myQuality.Length())
1806  {
1807  // Only get here if the index was out of range, so thow an exception.
1808  // Technically the myQuality string is not guaranteed to be the same length
1809  // as the sequence, so this catches that error.
1810  String exceptionString = "SamRecord::getQuality(";
1811  exceptionString += index;
1812  exceptionString += ") is out of range. Index must be between 0 and ";
1813  exceptionString += (myQuality.Length() - 1);
1814  throw std::runtime_error(exceptionString.c_str());
1815  }
1816  else
1817  {
1818  return(myQuality[index]);
1819  }
1820  }
1821 }

References getReadLength(), and BaseUtilities::UNKNOWN_QUALITY_CHAR.

◆ getReadLength()

int32_t SamRecord::getReadLength ( )

Get the length of the read.

Returns
read length.

Definition at line 1379 of file SamRecord.cpp.

1380 {
1381  myStatus = SamStatus::SUCCESS;
1382  if(myIsSequenceBufferValid == false)
1383  {
1384  // If the sequence is "*", then return 0.
1385  if((mySequence.Length() == 1) && (mySequence[0] == '*'))
1386  {
1387  return(0);
1388  }
1389  // Do not add 1 since it is not null terminated.
1390  return(mySequence.Length());
1391  }
1392  return(myRecordPtr->myReadLength);
1393 }

References StatGenStatus::SUCCESS.

Referenced by SamFilter::clipOnMismatchThreshold(), SamQuerySeqWithRefIter::getNextMatchMismatch(), getQuality(), getSequence(), SamValidator::isValidCigar(), SamValidator::isValidQuality(), SamQuerySeqWithRefIter::reset(), and CigarHelper::softClipEndByRefPos().

◆ getReadName()

const char * SamRecord::getReadName ( )

Returns the SAM formatted Read Name (QNAME).

Returns
read name.

Definition at line 1530 of file SamRecord.cpp.

1531 {
1532  myStatus = SamStatus::SUCCESS;
1533  if(myReadName.Length() == 0)
1534  {
1535  // 0 Length, means that it is in the buffer, but has not yet
1536  // been synced to the string, so do the sync.
1537  myReadName = (char*)&(myRecordPtr->myData);
1538  }
1539  return myReadName.c_str();
1540 }

References StatGenStatus::SUCCESS.

Referenced by getFields(), SamValidator::isValid(), and SamFile::validateSortOrder().

◆ getReadNameLength()

uint8_t SamRecord::getReadNameLength ( )

Get the length of the readname (QNAME) including the null.

Returns
length of the read name (including null).

Definition at line 1314 of file SamRecord.cpp.

1315 {
1316  myStatus = SamStatus::SUCCESS;
1317  // If the buffer is valid, return the size from there, otherwise get the
1318  // size from the string length + 1 (ending null).
1319  if(myIsReadNameBufferValid)
1320  {
1321  return(myRecordPtr->myReadNameLength);
1322  }
1323 
1324  return(myReadName.Length() + 1);
1325 }

References StatGenStatus::SUCCESS.

Referenced by SamValidator::isValid().

◆ getRecordBuffer() [1/2]

const void * SamRecord::getRecordBuffer ( )

Get a const pointer to the buffer that contains the BAM representation of the record.

Returns
const pointer to the buffer that contains the BAM representation of the record.

Definition at line 1192 of file SamRecord.cpp.

1193 {
1194  return(getRecordBuffer(mySequenceTranslation));
1195 }

◆ getRecordBuffer() [2/2]

const void * SamRecord::getRecordBuffer ( SequenceTranslation  translation)

Get a const pointer to the buffer that contains the BAM representation of the record using the specified translation on the sequence.

Parameters
translationtype of sequence translation to use.
Returns
const pointer to the buffer that contains the BAM representation of the record.

Definition at line 1199 of file SamRecord.cpp.

1200 {
1201  myStatus = SamStatus::SUCCESS;
1202  bool status = true;
1203  // If the buffer is not synced or the sequence in the buffer is not
1204  // properly translated, fix the buffer.
1205  if((myIsBufferSynced == false) ||
1206  (myBufferSequenceTranslation != translation))
1207  {
1208  status &= fixBuffer(translation);
1209  }
1210  // If the buffer is synced, check to see if the tags need to be synced.
1211  if(myNeedToSetTagsInBuffer)
1212  {
1213  status &= setTagsInBuffer();
1214  }
1215  if(!status)
1216  {
1217  return(NULL);
1218  }
1219  return (const void *)myRecordPtr;
1220 }

References StatGenStatus::SUCCESS.

◆ getReference()

GenomeSequence * SamRecord::getReference ( )

Returns a pointer to the genome sequence object associated with this record if it was set (NULL if it was not set).

Returns
pointer to the GenomeSequence object or NULL if there isn't one.

Definition at line 1911 of file SamRecord.cpp.

1912 {
1913  return(myRefPtr);
1914 }

Referenced by SamValidator::isValidTags().

◆ getReferenceID()

int32_t SamRecord::getReferenceID ( )

Get the reference sequence id of the record (BAM format rid).

Returns
reference sequence id

Definition at line 1293 of file SamRecord.cpp.

1294 {
1295  myStatus = SamStatus::SUCCESS;
1296  return myRecordPtr->myReferenceID;
1297 }

References StatGenStatus::SUCCESS.

Referenced by SamCoordOutput::add(), SamValidator::isValid(), Pileup< TestPileupElement >::processAlignment(), Pileup< TestPileupElement >::processAlignmentRegion(), and SamFile::validateSortOrder().

◆ getReferenceName()

const char * SamRecord::getReferenceName ( )

Get the reference sequence name (RNAME) of the record.

Returns
reference sequence name

Definition at line 1286 of file SamRecord.cpp.

1287 {
1288  myStatus = SamStatus::SUCCESS;
1289  return myReferenceName.c_str();
1290 }

References StatGenStatus::SUCCESS.

Referenced by PileupElement::addEntry(), SamTags::createMDTag(), getMateReferenceNameOrEqual(), getSequence(), SamValidator::isValid(), and SamQuerySeqWithRefIter::reset().

◆ getSequence() [1/4]

const char * SamRecord::getSequence ( )

Returns the SAM formatted sequence string (SEQ), translating the base as specified by setSequenceTranslation.

Returns
sequence string.

Definition at line 1556 of file SamRecord.cpp.

1557 {
1558  return(getSequence(mySequenceTranslation));
1559 }

Referenced by PileupElementBaseQual::addEntry(), SamRecordHelper::checkSequence(), SamTags::createMDTag(), getFields(), SamQuerySeqWithRefIter::getNextMatchMismatch(), getSequence(), and shiftIndelsLeft().

◆ getSequence() [2/4]

char SamRecord::getSequence ( int  index)

Get the sequence base at the specified index into this sequence 0 to readLength - 1, translating the base as specified by setSequenceTranslation.

Throws an exception if index is out of range.

Parameters
indexindex into the sequence string (0 to readLength-1).
Returns
the sequence base at the specified index into the sequence.

Definition at line 1639 of file SamRecord.cpp.

1640 {
1641  return(getSequence(index, mySequenceTranslation));
1642 }

References getSequence().

◆ getSequence() [3/4]

char SamRecord::getSequence ( int  index,
SequenceTranslation  translation 
)

Get the sequence base at the specified index into this sequence 0 to readLength - 1 performing the specified sequence translation.

Throws an exception if index is out of range.

Parameters
indexindex into the sequence string (0 to readLength-1).
translationtype of sequence translation to use.
Returns
the sequence base at the specified index into the sequence.

Definition at line 1645 of file SamRecord.cpp.

1646 {
1647  static const char * asciiBases = "=AC.G...T......N";
1648 
1649  // Determine the read length.
1650  int32_t readLen = getReadLength();
1651 
1652  // If the read length is 0, this method should not be called.
1653  if(readLen == 0)
1654  {
1655  String exceptionString = "SamRecord::getSequence(";
1656  exceptionString += index;
1657  exceptionString += ") is not allowed since sequence = '*'";
1658  throw std::runtime_error(exceptionString.c_str());
1659  }
1660  else if((index < 0) || (index >= readLen))
1661  {
1662  // Only get here if the index was out of range, so thow an exception.
1663  String exceptionString = "SamRecord::getSequence(";
1664  exceptionString += index;
1665  exceptionString += ") is out of range. Index must be between 0 and ";
1666  exceptionString += (readLen - 1);
1667  throw std::runtime_error(exceptionString.c_str());
1668  }
1669 
1670  // Determine if translation needs to be done.
1671  if((translation == NONE) || (myRefPtr == NULL))
1672  {
1673  // No translation needs to be done.
1674  if(mySequence.Length() == 0)
1675  {
1676  // Parse BAM sequence.
1677  if(myIsSequenceBufferValid)
1678  {
1679  return(index & 1 ?
1680  asciiBases[myPackedSequence[index / 2] & 0xF] :
1681  asciiBases[myPackedSequence[index / 2] >> 4]);
1682  }
1683  else
1684  {
1685  String exceptionString = "SamRecord::getSequence(";
1686  exceptionString += index;
1687  exceptionString += ") called with no sequence set";
1688  throw std::runtime_error(exceptionString.c_str());
1689  }
1690  }
1691  // Already have string.
1692  return(mySequence[index]);
1693  }
1694  else
1695  {
1696  // Need to translate the sequence either to have '=' or to not
1697  // have it.
1698  // First check to see if the sequence has been set.
1699  if(mySequence.Length() == 0)
1700  {
1701  // 0 Length, means that it is in the buffer, but has not yet
1702  // been synced to the string, so do the sync.
1703  setSequenceAndQualityFromBuffer();
1704  }
1705 
1706  // Check the type of translation.
1707  if(translation == EQUAL)
1708  {
1709  // Check whether or not the string has already been
1710  // retrieved that has the '=' in it.
1711  if(mySeqWithEq.length() == 0)
1712  {
1713  // The string with '=' has not yet been determined,
1714  // so get the string.
1715  // Check to see if the sequence is defined.
1716  if(mySequence == "*")
1717  {
1718  // Sequence is undefined, so no translation necessary.
1719  mySeqWithEq = '*';
1720  }
1721  else
1722  {
1723  // Sequence defined, so translate it.
1724  SamQuerySeqWithRef::seqWithEquals(mySequence.c_str(),
1725  myRecordPtr->myPosition,
1726  *(getCigarInfo()),
1727  getReferenceName(),
1728  *myRefPtr,
1729  mySeqWithEq);
1730  }
1731  }
1732  // Sequence is set, so return it.
1733  return(mySeqWithEq[index]);
1734  }
1735  else
1736  {
1737  // translation == BASES
1738  // Check whether or not the string has already been
1739  // retrieved that does not have the '=' in it.
1740  if(mySeqWithoutEq.length() == 0)
1741  {
1742  // The string with '=' has not yet been determined,
1743  // so get the string.
1744  // Check to see if the sequence is defined.
1745  if(mySequence == "*")
1746  {
1747  // Sequence is undefined, so no translation necessary.
1748  mySeqWithoutEq = '*';
1749  }
1750  else
1751  {
1752  // Sequence defined, so translate it.
1753  // The string without '=' has not yet been determined,
1754  // so get the string.
1755  SamQuerySeqWithRef::seqWithoutEquals(mySequence.c_str(),
1756  myRecordPtr->myPosition,
1757  *(getCigarInfo()),
1758  getReferenceName(),
1759  *myRefPtr,
1760  mySeqWithoutEq);
1761  }
1762  }
1763  // Sequence is set, so return it.
1764  return(mySeqWithoutEq[index]);
1765  }
1766  }
1767 }

References EQUAL, getCigarInfo(), getReadLength(), getReferenceName(), NONE, SamQuerySeqWithRef::seqWithEquals(), and SamQuerySeqWithRef::seqWithoutEquals().

◆ getSequence() [4/4]

const char * SamRecord::getSequence ( SequenceTranslation  translation)

Returns the SAM formatted sequence string (SEQ) performing the specified sequence translation.

Parameters
translationtype of sequence translation to use.
Returns
sequence string.

Definition at line 1562 of file SamRecord.cpp.

1563 {
1564  myStatus = SamStatus::SUCCESS;
1565  if(mySequence.Length() == 0)
1566  {
1567  // 0 Length, means that it is in the buffer, but has not yet
1568  // been synced to the string, so do the sync.
1569  setSequenceAndQualityFromBuffer();
1570  }
1571 
1572  // Determine if translation needs to be done.
1573  if((translation == NONE) || (myRefPtr == NULL))
1574  {
1575  return mySequence.c_str();
1576  }
1577  else if(translation == EQUAL)
1578  {
1579  if(mySeqWithEq.length() == 0)
1580  {
1581  // Check to see if the sequence is defined.
1582  if(mySequence == "*")
1583  {
1584  // Sequence is undefined, so no translation necessary.
1585  mySeqWithEq = '*';
1586  }
1587  else
1588  {
1589  // Sequence defined, so translate it.
1590  SamQuerySeqWithRef::seqWithEquals(mySequence.c_str(),
1591  myRecordPtr->myPosition,
1592  *(getCigarInfo()),
1593  getReferenceName(),
1594  *myRefPtr,
1595  mySeqWithEq);
1596  }
1597  }
1598  return(mySeqWithEq.c_str());
1599  }
1600  else
1601  {
1602  // translation == BASES
1603  if(mySeqWithoutEq.length() == 0)
1604  {
1605  if(mySequence == "*")
1606  {
1607  // Sequence is undefined, so no translation necessary.
1608  mySeqWithoutEq = '*';
1609  }
1610  else
1611  {
1612  // Sequence defined, so translate it.
1613  SamQuerySeqWithRef::seqWithoutEquals(mySequence.c_str(),
1614  myRecordPtr->myPosition,
1615  *(getCigarInfo()),
1616  getReferenceName(),
1617  *myRefPtr,
1618  mySeqWithoutEq);
1619  }
1620  }
1621  return(mySeqWithoutEq.c_str());
1622  }
1623 }

References EQUAL, getCigarInfo(), getReferenceName(), NONE, SamQuerySeqWithRef::seqWithEquals(), SamQuerySeqWithRef::seqWithoutEquals(), and StatGenStatus::SUCCESS.

◆ getStatus()

const SamStatus & SamRecord::getStatus ( )

Returns the status associated with the last method that sets the status.

Returns
SamStatus of the last command that sets status.

Definition at line 2391 of file SamRecord.cpp.

2392 {
2393  return(myStatus);
2394 }

◆ getStringTag()

const String * SamRecord::getStringTag ( const char *  tag)

Get the string value for the specified tag.

Parameters
tagtag to retrieve
pointerto the tag's string value if found, NULL if not found.

Definition at line 2168 of file SamRecord.cpp.

2169 {
2170  // Parse the buffer if necessary.
2171  if(myNeedToSetTagsFromBuffer)
2172  {
2173  if(!setTagsFromBuffer())
2174  {
2175  // Failed to read the tags from the buffer, so cannot
2176  // get tags. setTagsFromBuffer set the errors,
2177  // so just return null.
2178  return(NULL);
2179  }
2180  }
2181 
2182  int key = MAKEKEY(tag[0], tag[1], 'Z');
2183  int offset = extras.Find(key);
2184 
2185  int value;
2186  if (offset < 0)
2187  {
2188  // Check for 'B' tag.
2189  key = MAKEKEY(tag[0], tag[1], 'B');
2190  offset = extras.Find(key);
2191  if(offset < 0)
2192  {
2193  // Tag not found.
2194  return(NULL);
2195  }
2196  }
2197 
2198  // Offset is valid, so return the tag.
2199  value = extras[offset];
2200  return(&(strings[value]));
2201 }

Referenced by SamTags::isMDTagCorrect(), and SamValidator::isValidTags().

◆ getTagLength()

uint32_t SamRecord::getTagLength ( )

Returns the length of the BAM formatted tags.

Returns
length of the BAM formatted tags.

Definition at line 1917 of file SamRecord.cpp.

1918 {
1919  myStatus = SamStatus::SUCCESS;
1920  if(myNeedToSetTagsFromBuffer)
1921  {
1922  // Tags are only set in the buffer, so the size of the tags is
1923  // the length of the record minus the starting location of the tags.
1924  unsigned char * tagStart =
1925  (unsigned char *)myRecordPtr->myData
1926  + myRecordPtr->myReadNameLength
1927  + myRecordPtr->myCigarLength * sizeof(int)
1928  + (myRecordPtr->myReadLength + 1) / 2 + myRecordPtr->myReadLength;
1929 
1930  // The non-tags take up from the start of the record to the tag start.
1931  // Do not include the block size part of the record since it is not
1932  // included in the size.
1933  uint32_t nonTagSize =
1934  tagStart - (unsigned char*)&(myRecordPtr->myReferenceID);
1935  // Tags take up the size of the block minus the non-tag section.
1936  uint32_t tagSize = myRecordPtr->myBlockSize - nonTagSize;
1937  return(tagSize);
1938  }
1939 
1940  // Tags are stored outside the buffer, so myTagBufferSize is set.
1941  return(myTagBufferSize);
1942 }

References StatGenStatus::SUCCESS.

◆ getTagsString()

bool SamRecord::getTagsString ( const char *  tags,
String returnString,
char  delim = '\t' 
)

Get the string representation of the tags from the record, formatted as TAG:TYPE:VALUE<delim>TAG:TYPE:VALUE...

Sets the Status to SUCCESS when the tags are successfully returned or the tags were not found. If a different error occured, the status is set appropriately. The delimiter between the tags to retrieve is ',' or ';'. ',' was added since the original delimiter, ';', requires the string to be quoted on the command-line.

Parameters
tagsthe tags to retrieve, formatted as TAG:TYPE,TAG:TYPE...
returnStringthe String to set (this method first clears returnString) to TAG:TYPE:VALUE<delim>TAG:TYPE:VALUE...
delimdelimiter to use to separate two tags, default is a tab.
Returns
true if there were not any errors even if no tags were found.

Definition at line 2070 of file SamRecord.cpp.

2071 {
2072  const char* currentTagPtr = tags;
2073 
2074  returnString.Clear();
2075  myStatus = SamStatus::SUCCESS;
2076  if(myNeedToSetTagsFromBuffer)
2077  {
2078  if(!setTagsFromBuffer())
2079  {
2080  // Failed to read the tags from the buffer, so cannot
2081  // get tags.
2082  return(false);
2083  }
2084  }
2085 
2086  bool returnStatus = true;
2087 
2088  while(*currentTagPtr != '\0')
2089  {
2090  // Tags are formatted as: XY:Z
2091  // Where X is [A-Za-z], Y is [A-Za-z], and
2092  // Z is A,i,f,Z,H (cCsSI are also excepted)
2093  if((currentTagPtr[0] == '\0') || (currentTagPtr[1] == '\0') ||
2094  (currentTagPtr[2] != ':') || (currentTagPtr[3] == '\0'))
2095  {
2096  myStatus.setStatus(SamStatus::INVALID,
2097  "getTagsString called with improperly formatted tags.\n");
2098  returnStatus = false;
2099  break;
2100  }
2101 
2102  // Construct the key.
2103  int key = MAKEKEY(currentTagPtr[0], currentTagPtr[1],
2104  currentTagPtr[3]);
2105  // Look to see if the key exsists in the hash.
2106  int offset = extras.Find(key);
2107 
2108  if(offset >= 0)
2109  {
2110  // Offset is set, so the key was found.
2111  if(!returnString.IsEmpty())
2112  {
2113  returnString += delim;
2114  }
2115  returnString += currentTagPtr[0];
2116  returnString += currentTagPtr[1];
2117  returnString += ':';
2118  returnString += currentTagPtr[3];
2119  returnString += ':';
2120 
2121  // First if it is an integer, determine the actual type of the int.
2122  char vtype;
2123  getTypeFromKey(key, vtype);
2124 
2125  switch(vtype)
2126  {
2127  case 'i':
2128  returnString += *(int*)getIntegerPtr(offset, vtype);
2129  break;
2130  case 'f':
2131  returnString += *(float*)getFloatPtr(offset);
2132  break;
2133  case 'Z':
2134  case 'B':
2135  returnString += *(String*)getStringPtr(offset);
2136  break;
2137  default:
2138  myStatus.setStatus(SamStatus::INVALID,
2139  "rmTag called with unknown type.\n");
2140  returnStatus = false;
2141  break;
2142  };
2143  }
2144  // Increment to the next tag.
2145  if((currentTagPtr[4] == ';') || (currentTagPtr[4] == ','))
2146  {
2147  // Increment once more.
2148  currentTagPtr += 5;
2149  }
2150  else if(currentTagPtr[4] != '\0')
2151  {
2152  // Invalid tag format.
2153  myStatus.setStatus(SamStatus::INVALID,
2154  "rmTags called with improperly formatted tags.\n");
2155  returnStatus = false;
2156  break;
2157  }
2158  else
2159  {
2160  // Last Tag.
2161  currentTagPtr += 4;
2162  }
2163  }
2164  return(returnStatus);
2165 }

References StatGenStatus::INVALID, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ isCharType()

bool SamRecord::isCharType ( char  vtype)
static

Returns whether or not the specified vtype is a char type.

Does not set SamStatus.

Parameters
vtypevalue type to check.
Returns
true if the passed in vtype is a char ('A'), false otherwise.

Definition at line 2050 of file SamRecord.cpp.

2051 {
2052  if(vtype == 'A')
2053  {
2054  return(true);
2055  }
2056  return(false);
2057 }

Referenced by SamRecordHelper::genSamTagString().

◆ isFloatType()

bool SamRecord::isFloatType ( char  vtype)
static

Returns whether or not the specified vtype is a float type.

Does not set SamStatus.

Parameters
vtypevalue type to check.
Returns
true if the passed in vtype is a float ('f'), false otherwise.

Definition at line 2040 of file SamRecord.cpp.

2041 {
2042  if(vtype == 'f')
2043  {
2044  return(true);
2045  }
2046  return(false);
2047 }

Referenced by SamRecordHelper::genSamTagString().

◆ isIntegerType()

bool SamRecord::isIntegerType ( char  vtype)
static

Returns whether or not the specified vtype is an integer type.

Does not set SamStatus.

Parameters
vtypevalue type to check.
Returns
true if the passed in vtype is an integer ('c', 'C', 's', 'S', 'i', 'I'), false otherwise.

Definition at line 2028 of file SamRecord.cpp.

2029 {
2030  if((vtype == 'c') || (vtype == 'C') ||
2031  (vtype == 's') || (vtype == 'S') ||
2032  (vtype == 'i') || (vtype == 'I'))
2033  {
2034  return(true);
2035  }
2036  return(false);
2037 }

Referenced by SamRecordHelper::genSamTagString().

◆ isStringType()

bool SamRecord::isStringType ( char  vtype)
static

Returns whether or not the specified vtype is a string type.

Does not set SamStatus.

Parameters
vtypevalue type to check.
Returns
true if the passed in vtype is a string ('Z'/'B'), false othwerise.

Definition at line 2060 of file SamRecord.cpp.

2061 {
2062  if((vtype == 'Z') || (vtype == 'B'))
2063  {
2064  return(true);
2065  }
2066  return(false);
2067 }

Referenced by SamRecordHelper::genSamTagString().

◆ isValid()

bool SamRecord::isValid ( SamFileHeader header)

Returns whether or not the record is valid, setting the status to indicate success or failure.

Parameters
headerSAM Header associated with the record. Used to perform some validation against the header.
Returns
true if the record is valid, false if not.

Definition at line 161 of file SamRecord.cpp.

162 {
163  myStatus = SamStatus::SUCCESS;
164  SamValidationErrors invalidSamErrors;
165  if(!SamValidator::isValid(header, *this, invalidSamErrors))
166  {
167  // The record is not valid.
168  std::string errorMessage = "";
169  invalidSamErrors.getErrorString(errorMessage);
170  myStatus.setStatus(SamStatus::INVALID, errorMessage.c_str());
171  return(false);
172  }
173  // The record is valid.
174  return(true);
175 }

References SamValidationErrors::getErrorString(), StatGenStatus::INVALID, SamValidator::isValid(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ resetRecord()

void SamRecord::resetRecord ( )

Reset the fields of the record to a default value.

This is not necessary when you are reading a SAM/BAM file, but if you are setting fields, it is a good idea to clean out a record before reusing it. Clearing it allows you to not have to set any empty fields.

Definition at line 91 of file SamRecord.cpp.

92 {
93  myIsBufferSynced = true;
94 
95  myRecordPtr->myBlockSize = DEFAULT_BLOCK_SIZE;
96  myRecordPtr->myReferenceID = -1;
97  myRecordPtr->myPosition = -1;
98  myRecordPtr->myReadNameLength = DEFAULT_READ_NAME_LENGTH;
99  myRecordPtr->myMapQuality = 0;
100  myRecordPtr->myBin = DEFAULT_BIN;
101  myRecordPtr->myCigarLength = 0;
102  myRecordPtr->myFlag = 0;
103  myRecordPtr->myReadLength = 0;
104  myRecordPtr->myMateReferenceID = -1;
105  myRecordPtr->myMatePosition = -1;
106  myRecordPtr->myInsertSize = 0;
107 
108  // Set the sam values for the variable length fields.
109  // TODO - one way to speed this up might be to not set to "*" and just
110  // clear them, and write out a '*' for SAM if it is empty.
111  myReadName = DEFAULT_READ_NAME;
112  myReferenceName = "*";
113  myMateReferenceName = "*";
114  myCigar = "*";
115  mySequence = "*";
116  mySeqWithEq.clear();
117  mySeqWithoutEq.clear();
118  myQuality = "*";
119  myNeedToSetTagsFromBuffer = false;
120  myNeedToSetTagsInBuffer = false;
121 
122  // Initialize the calculated alignment info to the uncalculated value.
123  myAlignmentLength = -1;
124  myUnclippedStartOffset = -1;
125  myUnclippedEndOffset = -1;
126 
127  clearTags();
128 
129  // Set the bam values for the variable length fields.
130  // Only the read name needs to be set, the others are a length of 0.
131  // Set the read name. The min size of myRecordPtr includes the size for
132  // the default read name.
133  memcpy(&(myRecordPtr->myData), myReadName.c_str(),
134  myRecordPtr->myReadNameLength);
135 
136  // Set that the variable length buffer fields are valid.
137  myIsReadNameBufferValid = true;
138  myIsCigarBufferValid = true;
139  myPackedSequence =
140  (unsigned char *)myRecordPtr->myData + myRecordPtr->myReadNameLength +
141  myRecordPtr->myCigarLength * sizeof(int);
142  myIsSequenceBufferValid = true;
143  myBufferSequenceTranslation = NONE;
144 
145  myPackedQuality = myPackedSequence;
146  myIsQualityBufferValid = true;
147  myIsTagsBufferValid = true;
148  myIsBinValid = true;
149 
150  myCigarTempBufferLength = -1;
151 
152  myStatus = SamStatus::SUCCESS;
153 
154  NOT_FOUND_TAG_STRING = "";
155  NOT_FOUND_TAG_INT = -1; // TODO - deprecate
156 }

References clearTags(), NONE, and StatGenStatus::SUCCESS.

Referenced by SamRecord(), setBuffer(), setBufferFromFile(), and ~SamRecord().

◆ rmTag()

bool SamRecord::rmTag ( const char *  tag,
char  type 
)

Remove a tag.

Parameters
tagtag to remove.
typeof the tag to be removed.
Returns
true if the tag no longer exists in the record, false if it could not be removed (Returns true if the tag was not found in the record).

Definition at line 980 of file SamRecord.cpp.

981 {
982  // Check the length of tag.
983  if(strlen(tag) != 2)
984  {
985  // Tag is the wrong length.
986  myStatus.setStatus(SamStatus::INVALID,
987  "rmTag called with tag that is not 2 characters\n");
988  return(false);
989  }
990 
991  myStatus = SamStatus::SUCCESS;
992  if(myNeedToSetTagsFromBuffer)
993  {
994  if(!setTagsFromBuffer())
995  {
996  // Failed to read the tags from the buffer, so cannot
997  // get tags.
998  return(false);
999  }
1000  }
1001 
1002  // Construct the key.
1003  int key = MAKEKEY(tag[0], tag[1], type);
1004  // Look to see if the key exsists in the hash.
1005  int offset = extras.Find(key);
1006 
1007  if(offset < 0)
1008  {
1009  // Not found, so return true, successfully removed since
1010  // it is not in tag.
1011  return(true);
1012  }
1013 
1014  // Offset is set, so the key was found.
1015  // First if it is an integer, determine the actual type of the int.
1016  char vtype;
1017  getTypeFromKey(key, vtype);
1018  if(vtype == 'i')
1019  {
1020  vtype = getIntegerType(offset);
1021  }
1022 
1023  // Offset is set, so recalculate the buffer size without this entry.
1024  // Do NOT remove from strings, integers, or floats because then
1025  // extras would need to be updated for all entries with the new indexes
1026  // into those variables.
1027  int rmBuffSize = 0;
1028  switch(vtype)
1029  {
1030  case 'A':
1031  case 'c':
1032  case 'C':
1033  rmBuffSize = 4;
1034  break;
1035  case 's':
1036  case 'S':
1037  rmBuffSize = 5;
1038  break;
1039  case 'i':
1040  case 'I':
1041  rmBuffSize = 7;
1042  break;
1043  case 'f':
1044  rmBuffSize = 7;
1045  break;
1046  case 'Z':
1047  rmBuffSize = 4 + getString(offset).Length();
1048  break;
1049  case 'B':
1050  rmBuffSize = 3 + getBtagBufferSize(getString(offset));
1051  break;
1052  default:
1053  myStatus.setStatus(SamStatus::INVALID,
1054  "rmTag called with unknown type.\n");
1055  return(false);
1056  break;
1057  };
1058 
1059  // The buffer tags are now out of sync.
1060  myNeedToSetTagsInBuffer = true;
1061  myIsTagsBufferValid = false;
1062  myIsBufferSynced = false;
1063  myTagBufferSize -= rmBuffSize;
1064 
1065  // Remove from the hash.
1066  extras.Delete(offset);
1067  return(true);
1068 }

References getString(), StatGenStatus::INVALID, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ rmTags()

bool SamRecord::rmTags ( const char *  tags)

Remove tags.

The delimiter between the tags is ',' or ';'. ',' was added since the original delimiter, ';', requires the string to be quoted on the command-line.

Parameters
tagstags to remove, formatted as Tag:Type,Tag:Type,Tag:Type...
Returns
true if all tags no longer exist in the record, false if any could not be removed (Returns true if the tags were not found in the record). SamStatus is set to INVALID if the tags are incorrectly formatted.

Definition at line 1071 of file SamRecord.cpp.

1072 {
1073  const char* currentTagPtr = tags;
1074 
1075  myStatus = SamStatus::SUCCESS;
1076  if(myNeedToSetTagsFromBuffer)
1077  {
1078  if(!setTagsFromBuffer())
1079  {
1080  // Failed to read the tags from the buffer, so cannot
1081  // get tags.
1082  return(false);
1083  }
1084  }
1085 
1086  bool returnStatus = true;
1087 
1088  int rmBuffSize = 0;
1089  while(*currentTagPtr != '\0')
1090  {
1091 
1092  // Tags are formatted as: XY:Z
1093  // Where X is [A-Za-z], Y is [A-Za-z], and
1094  // Z is A,i,f,Z,H (cCsSI are also excepted)
1095  if((currentTagPtr[0] == '\0') || (currentTagPtr[1] == '\0') ||
1096  (currentTagPtr[2] != ':') || (currentTagPtr[3] == '\0'))
1097  {
1098  myStatus.setStatus(SamStatus::INVALID,
1099  "rmTags called with improperly formatted tags.\n");
1100  returnStatus = false;
1101  break;
1102  }
1103 
1104  // Construct the key.
1105  int key = MAKEKEY(currentTagPtr[0], currentTagPtr[1],
1106  currentTagPtr[3]);
1107  // Look to see if the key exsists in the hash.
1108  int offset = extras.Find(key);
1109 
1110  if(offset >= 0)
1111  {
1112  // Offset is set, so the key was found.
1113  // First if it is an integer, determine the actual type of the int.
1114  char vtype;
1115  getTypeFromKey(key, vtype);
1116  if(vtype == 'i')
1117  {
1118  vtype = getIntegerType(offset);
1119  }
1120 
1121  // Offset is set, so recalculate the buffer size without this entry.
1122  // Do NOT remove from strings, integers, or floats because then
1123  // extras would need to be updated for all entries with the new indexes
1124  // into those variables.
1125  switch(vtype)
1126  {
1127  case 'A':
1128  case 'c':
1129  case 'C':
1130  rmBuffSize += 4;
1131  break;
1132  case 's':
1133  case 'S':
1134  rmBuffSize += 5;
1135  break;
1136  case 'i':
1137  case 'I':
1138  rmBuffSize += 7;
1139  break;
1140  case 'f':
1141  rmBuffSize += 7;
1142  break;
1143  case 'Z':
1144  rmBuffSize += 4 + getString(offset).Length();
1145  break;
1146  case 'B':
1147  rmBuffSize += 3 + getBtagBufferSize(getString(offset));
1148  break;
1149  default:
1150  myStatus.setStatus(SamStatus::INVALID,
1151  "rmTag called with unknown type.\n");
1152  returnStatus = false;
1153  break;
1154  };
1155 
1156  // Remove from the hash.
1157  extras.Delete(offset);
1158  }
1159  // Increment to the next tag.
1160  if((currentTagPtr[4] == ';') || (currentTagPtr[4] == ','))
1161  {
1162  // Increment once more.
1163  currentTagPtr += 5;
1164  }
1165  else if(currentTagPtr[4] != '\0')
1166  {
1167  // Invalid tag format.
1168  myStatus.setStatus(SamStatus::INVALID,
1169  "rmTags called with improperly formatted tags.\n");
1170  returnStatus = false;
1171  break;
1172  }
1173  else
1174  {
1175  // Last Tag.
1176  currentTagPtr += 4;
1177  }
1178  }
1179 
1180  // The buffer tags are now out of sync.
1181  myNeedToSetTagsInBuffer = true;
1182  myIsTagsBufferValid = false;
1183  myIsBufferSynced = false;
1184  myTagBufferSize -= rmBuffSize;
1185 
1186 
1187  return(returnStatus);
1188 }

References getString(), StatGenStatus::INVALID, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ set0BasedMatePosition()

bool SamRecord::set0BasedMatePosition ( int32_t  matePosition)

Set the mate/next fragment's leftmost position using the specified 0-based (BAM format) value.

Internal processing handles the switching between SAM/BAM formats when read/written.

Parameters
position0-based start position
Returns
true if successfully set, false if not.

Definition at line 328 of file SamRecord.cpp.

329 {
330  myStatus = SamStatus::SUCCESS;
331  myRecordPtr->myMatePosition = matePosition;
332  return true;
333 }

References StatGenStatus::SUCCESS.

Referenced by set1BasedMatePosition().

◆ set0BasedPosition()

bool SamRecord::set0BasedPosition ( int32_t  position)

Set the leftmost position using the specified 0-based (BAM format) value.

Internal processing handles the switching between SAM/BAM formats when read/written.

Parameters
position0-based start position
Returns
true if successfully set, false if not.

Definition at line 242 of file SamRecord.cpp.

243 {
244  myStatus = SamStatus::SUCCESS;
245  myRecordPtr->myPosition = position;
246  myIsBinValid = false;
247  return true;
248 }

References StatGenStatus::SUCCESS.

Referenced by set1BasedPosition(), and SamFilter::softClip().

◆ set1BasedMatePosition()

bool SamRecord::set1BasedMatePosition ( int32_t  matePosition)

Set the mate/next fragment's leftmost position (PNEXT) using the specified 1-based (SAM format) value.

Internal processing handles the switching between SAM/BAM formats when read/written.

Parameters
position1-based start position
Returns
true if successfully set, false if not.

Definition at line 322 of file SamRecord.cpp.

323 {
324  return(set0BasedMatePosition(matePosition - 1));
325 }

References set0BasedMatePosition().

◆ set1BasedPosition()

bool SamRecord::set1BasedPosition ( int32_t  position)

Set the leftmost position (POS) using the specified 1-based (SAM format) value.

Internal processing handles the switching between SAM/BAM formats when read/written.

Parameters
position1-based start position
Returns
true if successfully set, false if not.

Definition at line 236 of file SamRecord.cpp.

237 {
238  return(set0BasedPosition(position - 1));
239 }

References set0BasedPosition().

◆ setBuffer()

SamStatus::Status SamRecord::setBuffer ( const char *  fromBuffer,
uint32_t  fromBufferSize,
SamFileHeader header 
)

Sets the SamRecord to contain the information in the BAM formatted fromBuffer.

Parameters
fromBufferbuffer to read the BAM record from.
fromBufferSizesize of the buffer containing the BAM record.
headerBAM header for the record.
Returns
status of reading the BAM record from the buffer.

Definition at line 525 of file SamRecord.cpp.

528 {
529  myStatus = SamStatus::SUCCESS;
530  if((fromBuffer == NULL) || (fromBufferSize == 0))
531  {
532  // Buffer is empty.
534  "Cannot parse an empty file.");
535  return(SamStatus::FAIL_PARSE);
536  }
537 
538  // Clear the record.
539  resetRecord();
540 
541  // allocate space for the record size.
542  if(!allocateRecordStructure(fromBufferSize))
543  {
544  // Failed to allocate space.
545  return(SamStatus::FAIL_MEM);
546  }
547 
548  memcpy(myRecordPtr, fromBuffer, fromBufferSize);
549 
550  setVariablesForNewBuffer(header);
551 
552  // Return the status of the record.
553  return(SamStatus::SUCCESS);
554 }

References StatGenStatus::FAIL_MEM, StatGenStatus::FAIL_PARSE, resetRecord(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ setBufferFromFile()

SamStatus::Status SamRecord::setBufferFromFile ( IFILE  filePtr,
SamFileHeader header 
)

Read the BAM record from a file.

Parameters
filePtrfile to read the buffer from.
headerBAM header for the record.
Returns
status of the reading the BAM record from the file.

Definition at line 558 of file SamRecord.cpp.

560 {
561  myStatus = SamStatus::SUCCESS;
562  if((filePtr == NULL) || (filePtr->isOpen() == false))
563  {
564  // File is not open, return failure.
566  "Can't read from an unopened file.");
567  return(SamStatus::FAIL_ORDER);
568  }
569 
570  // Clear the record.
571  resetRecord();
572 
573  // read the record size.
574  int numBytes =
575  ifread(filePtr, &(myRecordPtr->myBlockSize), sizeof(int32_t));
576 
577  // Check to see if the end of the file was hit and no bytes were read.
578  if(ifeof(filePtr) && (numBytes == 0))
579  {
580  // End of file, nothing was read, no more records.
582  "No more records left to read.");
583  return(SamStatus::NO_MORE_RECS);
584  }
585 
586  if(numBytes != sizeof(int32_t))
587  {
588  // Failed to read the entire block size. Either the end of the file
589  // was reached early or there was an error.
590  if(ifeof(filePtr))
591  {
592  // Error: end of the file reached prior to reading the rest of the
593  // record.
595  "EOF reached in the middle of a record.");
596  return(SamStatus::FAIL_PARSE);
597  }
598  else
599  {
600  // Error reading.
601  myStatus.setStatus(SamStatus::FAIL_IO,
602  "Failed to read the record size.");
603  return(SamStatus::FAIL_IO);
604  }
605  }
606 
607  // allocate space for the record size.
608  if(!allocateRecordStructure(myRecordPtr->myBlockSize + sizeof(int32_t)))
609  {
610  // Failed to allocate space.
611  // Status is set by allocateRecordStructure.
612  return(SamStatus::FAIL_MEM);
613  }
614 
615  // Read the rest of the alignment block, starting at the reference id.
616  if(ifread(filePtr, &(myRecordPtr->myReferenceID), myRecordPtr->myBlockSize)
617  != (unsigned int)myRecordPtr->myBlockSize)
618  {
619  // Error reading the record. Reset it and return failure.
620  resetRecord();
621  myStatus.setStatus(SamStatus::FAIL_IO,
622  "Failed to read the record");
623  return(SamStatus::FAIL_IO);
624  }
625 
626  setVariablesForNewBuffer(header);
627 
628  // Return the status of the record.
629  return(SamStatus::SUCCESS);
630 }

References StatGenStatus::FAIL_IO, StatGenStatus::FAIL_MEM, StatGenStatus::FAIL_ORDER, StatGenStatus::FAIL_PARSE, ifeof(), ifread(), InputFile::isOpen(), StatGenStatus::NO_MORE_RECS, resetRecord(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ setCigar() [1/2]

bool SamRecord::setCigar ( const char *  cigar)

Set the CIGAR to the specified SAM formatted cigar string.

Internal processing handles the switching between SAM/BAM formats when read/written.

Parameters
cigarstring containing the SAM formatted cigar.
Returns
true if successfully set, false if not.

Definition at line 259 of file SamRecord.cpp.

260 {
261  myStatus = SamStatus::SUCCESS;
262  myCigar = cigar;
263 
264  myIsBufferSynced = false;
265  myIsCigarBufferValid = false;
266  myCigarTempBufferLength = -1;
267  myIsBinValid = false;
268 
269  // Initialize the calculated alignment info to the uncalculated value.
270  myAlignmentLength = -1;
271  myUnclippedStartOffset = -1;
272  myUnclippedEndOffset = -1;
273 
274  return true;
275 }

References StatGenStatus::SUCCESS.

Referenced by SamFilter::filterRead(), shiftIndelsLeft(), and SamFilter::softClip().

◆ setCigar() [2/2]

bool SamRecord::setCigar ( const Cigar cigar)

Set the CIGAR to the specified Cigar object.

Internal processing handles the switching between SAM/BAM formats when read/written.

Parameters
cigarobject to set this record's cigar to have.
Returns
true if successfully set, false if not.

Definition at line 278 of file SamRecord.cpp.

279 {
280  myStatus = SamStatus::SUCCESS;
281  cigar.getCigarString(myCigar);
282 
283  myIsBufferSynced = false;
284  myIsCigarBufferValid = false;
285  myCigarTempBufferLength = -1;
286  myIsBinValid = false;
287 
288  // Initialize the calculated alignment info to the uncalculated value.
289  myAlignmentLength = -1;
290  myUnclippedStartOffset = -1;
291  myUnclippedEndOffset = -1;
292 
293  return true;
294 }

References Cigar::getCigarString(), and StatGenStatus::SUCCESS.

◆ setFlag()

bool SamRecord::setFlag ( uint16_t  flag)

Set the bitwise FLAG to the specified value.

Parameters
flaginteger flag to use.
Returns
true if successfully set, false if not.

Definition at line 215 of file SamRecord.cpp.

216 {
217  myStatus = SamStatus::SUCCESS;
218  myRecordPtr->myFlag = flag;
219  return true;
220 }

References StatGenStatus::SUCCESS.

Referenced by SamFilter::filterRead().

◆ setInsertSize()

bool SamRecord::setInsertSize ( int32_t  insertSize)

Sets the inferred insert size (ISIZE)/observed template length (TLEN).

Parameters
insertSizeinferred insert size/observed template length.
Returns
true if successfully set, false if not.

Definition at line 336 of file SamRecord.cpp.

337 {
338  myStatus = SamStatus::SUCCESS;
339  myRecordPtr->myInsertSize = insertSize;
340  return true;
341 }

References StatGenStatus::SUCCESS.

◆ setMapQuality()

bool SamRecord::setMapQuality ( uint8_t  mapQuality)

Set the mapping quality (MAPQ).

Parameters
mapQualitymap quality to set in the record.
Returns
true if successfully set, false if not.

Definition at line 251 of file SamRecord.cpp.

252 {
253  myStatus = SamStatus::SUCCESS;
254  myRecordPtr->myMapQuality = mapQuality;
255  return true;
256 }

References StatGenStatus::SUCCESS.

Referenced by SamFilter::filterRead().

◆ setMateReferenceName()

bool SamRecord::setMateReferenceName ( SamFileHeader header,
const char *  mateReferenceName 
)

Set the mate/next fragment's reference sequence name (RNEXT) to the specified name, using the header to determine the mate reference id.

Parameters
headerSAM/BAM header to use to determine the mate reference id.
referenceNamemate reference name to use.
Returns
true if successfully set, false if not

Definition at line 297 of file SamRecord.cpp.

299 {
300  myStatus = SamStatus::SUCCESS;
301  // Set the mate reference, if it is "=", set it to be equal
302  // to myReferenceName. This assumes that myReferenceName has already
303  // been called.
304  if(strcmp(mateReferenceName, FIELD_ABSENT_STRING) == 0)
305  {
306  myMateReferenceName = myReferenceName;
307  }
308  else
309  {
310  myMateReferenceName = mateReferenceName;
311  }
312 
313  // Set the Mate Reference ID.
314  // If the reference ID does not already exist, add it (pass true)
315  myRecordPtr->myMateReferenceID =
316  header.getReferenceID(myMateReferenceName, true);
317 
318  return true;
319 }

References SamFileHeader::getReferenceID(), and StatGenStatus::SUCCESS.

◆ setQuality()

bool SamRecord::setQuality ( const char *  quality)

Sets the quality (QUAL) to the specified SAM formatted quality string.

Internal processing handles switching between SAM/BAM formats when read/written.

Parameters
qualitySAM quality string.
Returns
true if successfully set, false if not.

Definition at line 357 of file SamRecord.cpp.

358 {
359  myStatus = SamStatus::SUCCESS;
360  myQuality = quality;
361  myIsBufferSynced = false;
362  myIsQualityBufferValid = false;
363  return true;
364 }

References StatGenStatus::SUCCESS.

◆ setReadName()

bool SamRecord::setReadName ( const char *  readName)

Set QNAME to the passed in name.

Parameters
readNamethe readname to set the QNAME to.
Returns
true if successfully set, false if not.

Definition at line 193 of file SamRecord.cpp.

194 {
195  myReadName = readName;
196  myIsBufferSynced = false;
197  myIsReadNameBufferValid = false;
198  myStatus = SamStatus::SUCCESS;
199 
200  // The read name must at least have some length, otherwise this is a parsing
201  // error.
202  if(myReadName.Length() == 0)
203  {
204  // Invalid - reset ReadName return false.
205  myReadName = DEFAULT_READ_NAME;
206  myRecordPtr->myReadNameLength = DEFAULT_READ_NAME_LENGTH;
207  myStatus.setStatus(SamStatus::INVALID, "0 length Query Name.");
208  return(false);
209  }
210 
211  return true;
212 }

References StatGenStatus::INVALID, StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.

◆ setReference()

void SamRecord::setReference ( GenomeSequence reference)

Set the reference to the specified genome sequence object.

Parameters
referencepointer to the GenomeSequence object.

Definition at line 178 of file SamRecord.cpp.

179 {
180  myRefPtr = reference;
181 }

Referenced by SamFile::GetNumOverlaps(), SamFile::ReadRecord(), SamFile::validateSortOrder(), and SamFile::WriteRecord().

◆ setReferenceName()

bool SamRecord::setReferenceName ( SamFileHeader header,
const char *  referenceName 
)

Set the reference sequence name (RNAME) to the specified name, using the header to determine the reference id.

Parameters
headerSAM/BAM header to use to determine the reference id.
referenceNamereference name to use.
Returns
true if successfully set, false if not

Definition at line 223 of file SamRecord.cpp.

225 {
226  myStatus = SamStatus::SUCCESS;
227 
228  myReferenceName = referenceName;
229  // If the reference ID does not already exist, add it (pass true)
230  myRecordPtr->myReferenceID = header.getReferenceID(referenceName, true);
231 
232  return true;
233 }

References SamFileHeader::getReferenceID(), and StatGenStatus::SUCCESS.

◆ setSequence()

bool SamRecord::setSequence ( const char *  seq)

Sets the sequence (SEQ) to the specified SAM formatted sequence string.

Internal processing handles switching between SAM/BAM formats when read/written.

Parameters
seqSAM sequence string. May contain '='.
Returns
true if successfully set, false if not.

Definition at line 344 of file SamRecord.cpp.

345 {
346  myStatus = SamStatus::SUCCESS;
347  mySequence = seq;
348  mySeqWithEq.clear();
349  mySeqWithoutEq.clear();
350 
351  myIsBufferSynced = false;
352  myIsSequenceBufferValid = false;
353  return true;
354 }

References StatGenStatus::SUCCESS.

◆ setSequenceTranslation()

void SamRecord::setSequenceTranslation ( SequenceTranslation  translation)

Set the type of sequence translation to use when getting the sequence.

The default type (if this method is never called) is NONE (the sequence is left as-is). Can be over-ridden by using the accessors that take a SequenceTranslation parameter.

Parameters
translationtype of sequence translation to use.

Definition at line 187 of file SamRecord.cpp.

188 {
189  mySequenceTranslation = translation;
190 }

Referenced by SamFile::GetNumOverlaps(), SamFile::ReadRecord(), and SamFile::validateSortOrder().

◆ shiftIndelsLeft()

bool SamRecord::shiftIndelsLeft ( )

Shift the indels (if any) to the left by updating the CIGAR.

Returns
true if the cigar was shifted, false if not.

Definition at line 368 of file SamRecord.cpp.

369 {
370  // Check to see whether or not the Cigar has already been
371  // set - this is determined by checking if alignment length
372  // is set since alignment length and the cigar are set
373  // at the same time.
374  if(myAlignmentLength == -1)
375  {
376  // Not been set, so calculate it.
377  parseCigar();
378  }
379 
380  // Track whether or not there was a shift.
381  bool shifted = false;
382 
383  // Cigar is set, so now myCigarRoller can be used.
384  // Track where in the read we are.
385  uint32_t currentPos = 0;
386 
387  // Since the loop starts at 1 because the first operation can't be shifted,
388  // increment the currentPos past the first operation.
389  if(Cigar::foundInQuery(myCigarRoller[0]))
390  {
391  // This op was found in the read, increment the current position.
392  currentPos += myCigarRoller[0].count;
393  }
394 
395  int numOps = myCigarRoller.size();
396 
397  // Loop through the cigar operations from the 2nd operation since
398  // the first operation is already on the end and can't shift.
399  for(int currentOp = 1; currentOp < numOps; currentOp++)
400  {
401  if(myCigarRoller[currentOp].operation == Cigar::insert)
402  {
403  // For now, only shift a max of 1 operation.
404  int prevOpIndex = currentOp-1;
405  // Track the next op for seeing if it is the same as the
406  // previous for merging reasons.
407  int nextOpIndex = currentOp+1;
408  if(nextOpIndex == numOps)
409  {
410  // There is no next op, so set it equal to the current one.
411  nextOpIndex = currentOp;
412  }
413  // The start of the previous operation, so we know when we hit it
414  // so we don't shift past it.
415  uint32_t prevOpStart =
416  currentPos - myCigarRoller[prevOpIndex].count;
417 
418  // We can only shift if the previous operation
419  if(!Cigar::isMatchOrMismatch(myCigarRoller[prevOpIndex]))
420  {
421  // TODO - shift past pads
422  // An insert is in the read, so increment the position.
423  currentPos += myCigarRoller[currentOp].count;
424  // Not a match/mismatch, so can't shift into it.
425  continue;
426  }
427 
428  // It is a match or mismatch, so check to see if we can
429  // shift into it.
430 
431  // The end of the insert is calculated by adding the size
432  // of this insert minus 1 to the start of the insert.
433  uint32_t insertEndPos =
434  currentPos + myCigarRoller[currentOp].count - 1;
435 
436  // The insert starts at the current position.
437  uint32_t insertStartPos = currentPos;
438 
439  // Loop as long as the position before the insert start
440  // matches the last character in the insert. If they match,
441  // the insert can be shifted one index left because the
442  // implied reference will not change. If they do not match,
443  // we can't shift because the implied reference would change.
444  // Stop loop when insertStartPos = prevOpStart, because we
445  // don't want to move past that.
446  while((insertStartPos > prevOpStart) &&
447  (getSequence(insertEndPos,BASES) ==
448  getSequence(insertStartPos - 1, BASES)))
449  {
450  // We can shift, so move the insert start & end one left.
451  --insertEndPos;
452  --insertStartPos;
453  }
454 
455  // Determine if a shift has occurred.
456  int shiftLen = currentPos - insertStartPos;
457  if(shiftLen > 0)
458  {
459  // Shift occured, so adjust the cigar if the cigar will
460  // not become more operations.
461  // If the next operation is the same as the previous or
462  // if the insert and the previous operation switch positions
463  // then the cigar has the same number of operations.
464  // If the next operation is different, and the shift splits
465  // the previous operation in 2, then the cigar would
466  // become longer, so we do not want to shift.
467  if(myCigarRoller[nextOpIndex].operation ==
468  myCigarRoller[prevOpIndex].operation)
469  {
470  // The operations are the same, so merge them by adding
471  // the length of the shift to the next operation.
472  myCigarRoller.IncrementCount(nextOpIndex, shiftLen);
473  myCigarRoller.IncrementCount(prevOpIndex, -shiftLen);
474 
475  // If the previous op length is 0, just remove that
476  // operation.
477  if(myCigarRoller[prevOpIndex].count == 0)
478  {
479  myCigarRoller.Remove(prevOpIndex);
480  }
481  shifted = true;
482  }
483  else
484  {
485  // Can only shift if the insert shifts past the
486  // entire previous operation, otherwise an operation
487  // would need to be added.
488  if(insertStartPos == prevOpStart)
489  {
490  // Swap the positions of the insert and the
491  // previous operation.
492  myCigarRoller.Update(currentOp,
493  myCigarRoller[prevOpIndex].operation,
494  myCigarRoller[prevOpIndex].count);
495  // Size of the previous op is the entire
496  // shift length.
497  myCigarRoller.Update(prevOpIndex,
499  shiftLen);
500  shifted = true;
501  }
502  }
503  }
504  // An insert is in the read, so increment the position.
505  currentPos += myCigarRoller[currentOp].count;
506  }
507  else if(Cigar::foundInQuery(myCigarRoller[currentOp]))
508  {
509  // This op was found in the read, increment the current position.
510  currentPos += myCigarRoller[currentOp].count;
511  }
512  }
513  if(shifted)
514  {
515  // TODO - setCigar is currently inefficient because later the cigar
516  // roller will be recalculated, but for now it will work.
517  setCigar(myCigarRoller);
518  }
519  return(shifted);
520 }

References BASES, Cigar::foundInQuery(), getSequence(), CigarRoller::IncrementCount(), Cigar::insert, Cigar::isMatchOrMismatch(), CigarRoller::Remove(), setCigar(), Cigar::size(), and CigarRoller::Update().

◆ writeRecordBuffer() [1/2]

SamStatus::Status SamRecord::writeRecordBuffer ( IFILE  filePtr)

Write the record as a BAM into the specified already opened file.

Parameters
filePtrfile to write the BAM record into.
Returns
status of the write.

Definition at line 1225 of file SamRecord.cpp.

1226 {
1227  return(writeRecordBuffer(filePtr, mySequenceTranslation));
1228 }

◆ writeRecordBuffer() [2/2]

SamStatus::Status SamRecord::writeRecordBuffer ( IFILE  filePtr,
SequenceTranslation  translation 
)

Write the record as a BAM into the specified already opened file using the specified translation on the sequence.

Parameters
filePtrfile to write the BAM record into.
translationtype of sequence translation to use.
Returns
status of the write.

Definition at line 1232 of file SamRecord.cpp.

1234 {
1235  myStatus = SamStatus::SUCCESS;
1236  if((filePtr == NULL) || (filePtr->isOpen() == false))
1237  {
1238  // File is not open, return failure.
1240  "Can't write to an unopened file.");
1241  return(SamStatus::FAIL_ORDER);
1242  }
1243 
1244  if((myIsBufferSynced == false) ||
1245  (myBufferSequenceTranslation != translation))
1246  {
1247  if(!fixBuffer(translation))
1248  {
1249  return(myStatus.getStatus());
1250  }
1251  }
1252 
1253  // Write the record.
1254  unsigned int numBytesToWrite = myRecordPtr->myBlockSize + sizeof(int32_t);
1255  unsigned int numBytesWritten =
1256  ifwrite(filePtr, myRecordPtr, numBytesToWrite);
1257 
1258  // Return status based on if the correct number of bytes were written.
1259  if(numBytesToWrite == numBytesWritten)
1260  {
1261  return(SamStatus::SUCCESS);
1262  }
1263  // The correct number of bytes were not written.
1264  myStatus.setStatus(SamStatus::FAIL_IO, "Failed to write the entire record.");
1265  return(SamStatus::FAIL_IO);
1266 }

References StatGenStatus::FAIL_IO, StatGenStatus::FAIL_ORDER, StatGenStatus::getStatus(), ifwrite(), InputFile::isOpen(), StatGenStatus::setStatus(), and StatGenStatus::SUCCESS.


The documentation for this class was generated from the following files:
SamRecord::get0BasedPosition
int32_t get0BasedPosition()
Get the 0-based(BAM) leftmost position of the record.
Definition: SamRecord.cpp:1307
SamRecord::getReferenceName
const char * getReferenceName()
Get the reference sequence name (RNAME) of the record.
Definition: SamRecord.cpp:1286
StatGenStatus::INVALID
@ INVALID
invalid other than for sorting.
Definition: StatGenStatus.h:44
SamRecord::get0BasedAlignmentEnd
int32_t get0BasedAlignmentEnd()
Returns the 0-based inclusive rightmost position of the clipped sequence.
Definition: SamRecord.cpp:1455
BaseUtilities::UNKNOWN_QUALITY_CHAR
static const char UNKNOWN_QUALITY_CHAR
Character used when the quality is unknown.
Definition: BaseUtilities.h:49
Cigar::isMatchOrMismatch
static bool isMatchOrMismatch(Operation op)
Return true if the specified operation is a match/mismatch operation, false if not.
Definition: Cigar.h:298
String
Definition: StringBasics.h:38
StatGenStatus::NO_MORE_RECS
@ NO_MORE_RECS
NO_MORE_RECS: failed to read a record since there are no more to read either in the file or section i...
Definition: StatGenStatus.h:36
Cigar::insert
@ insert
insertion to the reference (the query sequence contains bases that have no corresponding base in the ...
Definition: Cigar.h:91
SamValidationErrors
The SamValidationErrors class is a container class that holds SamValidationError Objects,...
Definition: SamValidation.h:116
CigarRoller::Remove
bool Remove(int index)
Remove the operation at the specified index.
Definition: CigarRoller.cpp:156
SamRecord::clearTags
void clearTags()
Clear the tags in this record.
Definition: SamRecord.cpp:965
StatGenStatus::SUCCESS
@ SUCCESS
method completed successfully.
Definition: StatGenStatus.h:32
Cigar::getCigarString
void getCigarString(String &cigarString) const
Set the passed in String to the string reprentation of the Cigar operations in this object.
Definition: Cigar.cpp:52
SamRecord::get0BasedUnclippedStart
int32_t get0BasedUnclippedStart()
Returns the 0-based inclusive left-most position adjusted for clipped bases.
Definition: SamRecord.cpp:1494
StatGenStatus::getStatus
Status getStatus() const
Return the enum for this status object.
Definition: StatGenStatus.cpp:142
CigarRoller::Update
bool Update(int index, Operation op, int count)
Updates the operation at the specified index to be the specified operation and have the specified cou...
Definition: CigarRoller.cpp:187
InputFile::isOpen
bool isOpen() const
Returns whether or not the file was successfully opened.
Definition: InputFile.h:423
SamRecord::getRecordBuffer
const void * getRecordBuffer()
Get a const pointer to the buffer that contains the BAM representation of the record.
Definition: SamRecord.cpp:1192
SamRecord::set0BasedPosition
bool set0BasedPosition(int32_t position)
Set the leftmost position using the specified 0-based (BAM format) value.
Definition: SamRecord.cpp:242
SamRecord::checkTag
bool checkTag(const char *tag, char type)
Check if the specified tag contains a value of the specified vtype.
Definition: SamRecord.cpp:2369
SamRecord::EQUAL
@ EQUAL
Translate bases that match the reference to '='.
Definition: SamRecord.h:59
StatGenStatus::FAIL_PARSE
@ FAIL_PARSE
failed to parse a record/header - invalid format.
Definition: StatGenStatus.h:42
StatGenStatus::FAIL_MEM
@ FAIL_MEM
fail a memory allocation.
Definition: StatGenStatus.h:45
Cigar::foundInQuery
static bool foundInQuery(Operation op)
Return true if the specified operation is found in the query sequence, false if not.
Definition: Cigar.h:219
SamRecord::resetTagIter
void resetTagIter()
Reset the tag iterator to the beginning of the tags.
Definition: SamRecord.cpp:2022
Cigar::size
int size() const
Return the number of cigar operations.
Definition: Cigar.h:364
ifeof
int ifeof(IFILE file)
Check to see if we have reached the EOF (returns 0 if not EOF).
Definition: InputFile.h:654
SamRecord::getQuality
const char * getQuality()
Returns the SAM formatted quality string (QUAL).
Definition: SamRecord.cpp:1626
SamRecord::writeRecordBuffer
SamStatus::Status writeRecordBuffer(IFILE filePtr)
Write the record as a BAM into the specified already opened file.
Definition: SamRecord.cpp:1225
SamRecord::set0BasedMatePosition
bool set0BasedMatePosition(int32_t matePosition)
Set the mate/next fragment's leftmost position using the specified 0-based (BAM format) value.
Definition: SamRecord.cpp:328
StatGenStatus::setStatus
void setStatus(Status newStatus, const char *newMessage)
Set the status with the specified status enum and message.
Definition: StatGenStatus.cpp:83
Cigar::getNumOverlaps
uint32_t getNumOverlaps(int32_t start, int32_t end, int32_t queryStartPos)
Return the number of bases that overlap the reference and the read associated with this cigar that fa...
Definition: Cigar.cpp:334
SamValidator::isValid
static bool isValid(SamFileHeader &samHeader, SamRecord &samRecord, SamValidationErrors &validationErrors)
Validates whether or not the specified SamRecord is valid, calling all of the other validations.
Definition: SamValidation.cpp:195
SamRecord::getReadName
const char * getReadName()
Returns the SAM formatted Read Name (QNAME).
Definition: SamRecord.cpp:1530
SamRecord::get0BasedUnclippedEnd
int32_t get0BasedUnclippedEnd()
Returns the 0-based inclusive right-most position adjusted for clipped bases.
Definition: SamRecord.cpp:1514
ifread
unsigned int ifread(IFILE file, void *buffer, unsigned int size)
Read up to size bytes from the file into the buffer.
Definition: InputFile.h:600
SamRecord::getSequence
const char * getSequence()
Returns the SAM formatted sequence string (SEQ), translating the base as specified by setSequenceTran...
Definition: SamRecord.cpp:1556
SamRecord::getReadLength
int32_t getReadLength()
Get the length of the read.
Definition: SamRecord.cpp:1379
SamRecord::getCigar
const char * getCigar()
Returns the SAM formatted CIGAR string.
Definition: SamRecord.cpp:1543
SamRecord::setCigar
bool setCigar(const char *cigar)
Set the CIGAR to the specified SAM formatted cigar string.
Definition: SamRecord.cpp:259
SamRecord::getCigarInfo
Cigar * getCigarInfo()
Returns a pointer to the Cigar object associated with this record.
Definition: SamRecord.cpp:1824
SamRecord::resetRecord
void resetRecord()
Reset the fields of the record to a default value.
Definition: SamRecord.cpp:91
StatGenStatus::FAIL_IO
@ FAIL_IO
method failed due to an I/O issue.
Definition: StatGenStatus.h:37
StatGenStatus::FAIL_ORDER
@ FAIL_ORDER
FAIL_ORDER: method failed because it was called out of order, like trying to read a file without open...
Definition: StatGenStatus.h:41
SamRecord::getString
const String & getString(const char *tag)
Get the string value for the specified tag.
Definition: SamRecord.cpp:2302
SamRecord::NONE
@ NONE
Leave the sequence as is.
Definition: SamRecord.h:58
SamFileHeader::getReferenceID
int getReferenceID(const String &referenceName, bool addID=false)
Get the reference ID for the specified reference name (chromosome).
Definition: SamFileHeader.cpp:146
SamRecord::addIntTag
bool addIntTag(const char *tag, int32_t value)
Add the specified integer tag to the record.
Definition: SamRecord.cpp:635
SamRecord::get1BasedAlignmentEnd
int32_t get1BasedAlignmentEnd()
Returns the 1-based inclusive rightmost position of the clipped sequence.
Definition: SamRecord.cpp:1474
SamValidationErrors::getErrorString
void getErrorString(std::string &errorString) const
Append the error messages contained in this container to the passed in string.
Definition: SamValidation.cpp:180
bamRecordStruct
Structure of a BAM record.
Definition: SamRecord.h:33
SamRecord::BASES
@ BASES
Translate '=' to the actual base.
Definition: SamRecord.h:60
ifwrite
unsigned int ifwrite(IFILE file, const void *buffer, unsigned int size)
Write the specified number of bytes from the specified buffer into the file.
Definition: InputFile.h:669
CigarRoller::IncrementCount
bool IncrementCount(int index, int increment)
Increments the count for the operation at the specified index by the specified value,...
Definition: CigarRoller.cpp:171
SamQuerySeqWithRef::seqWithEquals
static void seqWithEquals(const char *currentSeq, int32_t seq0BasedPos, Cigar &cigar, const char *referenceName, const GenomeSequence &refSequence, std::string &updatedSeq)
Gets the sequence with '=' in any position where the sequence matches the reference.
Definition: SamQuerySeqWithRefHelper.cpp:243
SamRecord::getFields
bool getFields(bamRecordStruct &recStruct, String &readName, String &cigar, String &sequence, String &quality)
Returns the values of all fields except the tags.
Definition: SamRecord.cpp:1854
SamQuerySeqWithRef::seqWithoutEquals
static void seqWithoutEquals(const char *currentSeq, int32_t seq0BasedPos, Cigar &cigar, const char *referenceName, const GenomeSequence &refSequence, std::string &updatedSeq)
Gets the sequence converting '=' to the appropriate base using the reference.
Definition: SamQuerySeqWithRefHelper.cpp:296