Class VcfEntry

All Implemented Interfaces:
Serializable, Cloneable, Comparable<Interval>, Iterable<VcfGenotype>, TxtSerializable

public class VcfEntry extends Marker implements Iterable<VcfGenotype>
A VCF entry is a line in a VCF file A VCF line can have multiple variants, and multiple genotypes
Author:
pablocingolani
See Also:
  • Field Details

  • Constructor Details

  • Method Details

    • cleanUnderscores

      public static String cleanUnderscores(String s)
      Return a string without leading, trailing and duplicated underscores
    • isEmpty

      public static boolean isEmpty(String value)
      Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)
    • isValidInfoKey

      public static boolean isValidInfoKey(String key)
      Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)
    • isValidInfoValue

      public static boolean isValidInfoValue(String value)
      Check that this value can be added to an INFO field
      Returns:
      true if OK, false if invalid value
    • vcfInfoDecode

      public static String vcfInfoDecode(String str)
      Decode INFO value
    • vcfInfoEncode

      public static String vcfInfoEncode(String str)
      Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TAB
    • vcfInfoKeySafe

      public static String vcfInfoKeySafe(String str)
      Return a string safe to be used in an 'INFO' field key
    • vcfInfoValueSafe

      public static String vcfInfoValueSafe(String str)
      Return a string safe to be used in an 'INFO' field value
    • addFilter

      public void addFilter(String filterStr)
      Add string to FILTER field
    • addFormat

      public void addFormat(String formatName)
      Add a 'FORMAT' field
    • addGenotype

      public void addGenotype(String vcfGenotypeStr)
      Add a genotype as a string
    • addInfo

      public void addInfo(String key, String value)
      Add a "key=value" tuple the info field
      Parameters:
      key - : INFO key name
      value - : Can be null if it is a boolean field.
    • alleleFrequencyType

      public VcfEntry.AlleleFrequencyType alleleFrequencyType()
      Categorization by allele frequency
    • calcHetero

      public Boolean calcHetero()
      Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file. Ohtherwise the field is null.
    • check

      public String check()
      Perform several simple checks and report problems (if any).
    • cloneShallow

      public Cds cloneShallow()
      Description copied from class: Marker
      Perform a shallow clone
      Overrides:
      cloneShallow in class Marker
    • compressGenotypes

      public boolean compressGenotypes()
      Compress genotypes into "HO/HE/NA" INFO fields
    • delFilter

      public boolean delFilter(String filterStr)
      Remove a string from FILTER field
    • getAltIndex

      public int getAltIndex(String alt)
      Get index of matching ALT entry
      Returns:
      -1 if not found
    • getAlts

      public String[] getAlts()
    • getAltsStr

      public String getAltsStr()
      Create a comma separated ALTS string
    • getChromosomeNameOri

      public String getChromosomeNameOri()
      Original chromosome name (as it appeared in the VCF file)
      Overrides:
      getChromosomeNameOri in class Interval
    • getFilter

      public String getFilter()
    • getFormat

      public String getFormat()
    • getFormatFields

      public String[] getFormatFields()
    • getGenotypesScores

      public byte[] getGenotypesScores()
      Return genotypes parsed as an array of codes
    • getInfo

      public String getInfo(String key)
      Get info string
    • getInfo

      public String getInfo(String key, String allele)
      Get info string for a specific allele
    • getInfo

      public String getInfo(String key, Variant var)
      Get an INFO field matching a variant
    • getInfoFlag

      public boolean getInfoFlag(String key)
      Does the entry exists?
    • getInfoFloat

      public double getInfoFloat(String key)
      Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitive
    • getInfoInt

      public long getInfoInt(String key)
      Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitive
    • getInfoKeys

      public Set<String> getInfoKeys()
      Get all keys available in the info field
    • getInfoStr

      public String getInfoStr()
      Get the full (unparsed) INFO field
    • getLine

      public String getLine()
      Original VCF line (from file)
    • getLineNum

      public int getLineNum()
    • getNumberOfSamples

      public int getNumberOfSamples()
      number of samples in this VCF file
    • getQuality

      public double getQuality()
    • getRef

      public String getRef()
    • getStr

      public String getStr()
    • getVcfEffects

      public List<VcfEffect> getVcfEffects()
    • getVcfEffects

      public List<VcfEffect> getVcfEffects(EffFormatVersion formatVersion)
      Parse 'EFF' info field and get a list of effects
    • getVcfFileIterator

      public VcfFileIterator getVcfFileIterator()
    • getVcfGenotype

      public VcfGenotype getVcfGenotype(int index)
    • getVcfGenotypes

      public List<VcfGenotype> getVcfGenotypes()
    • getVcfInfo

      public VcfHeaderInfo getVcfInfo(String id)
      Get VcfInfo type for a given ID
    • getVcfInfoNumber

      public VcfInfoType getVcfInfoNumber(String id)
      Get Info number for a given ID
    • hasField

      public boolean hasField(String filedName)
    • hasGenotypes

      public boolean hasGenotypes()
    • hasInfo

      public boolean hasInfo(String infoFieldName)
    • hasQuality

      public boolean hasQuality()
    • isBiAllelic

      public boolean isBiAllelic()
      Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
    • isCompressedGenotypes

      public boolean isCompressedGenotypes()
      Do we have compressed genotypes in "HO,HE,NA" INFO fields?
    • isFilterPass

      public boolean isFilterPass()
    • isMultiallelic

      public boolean isMultiallelic()
      Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
    • isShowWarningIfParentDoesNotInclude

      protected boolean isShowWarningIfParentDoesNotInclude()
      Description copied from class: Marker
      Show an error if parent does not include child?
      Overrides:
      isShowWarningIfParentDoesNotInclude in class Marker
    • isSingleSnp

      public boolean isSingleSnp()
      Is thins a VCF entry with a single SNP?
    • isSingleton

      public boolean isSingleton()
      Is this variant a singleton (appears only in one genotype)
    • isVariant

      public boolean isVariant()
      Is this a change or are the ALTs actually the same as the reference
    • isVariant

      public boolean isVariant(String alt)
      Is this ALT string a variant?
    • iterator

      public Iterator<VcfGenotype> iterator()
      Specified by:
      iterator in interface Iterable<VcfGenotype>
    • mac

      public int mac()
      Calculate Minor allele count
    • maf

      public double maf()
      Calculate Minor allele frequency
    • parse

      public void parse()
      Parse a 'line' from a 'vcfFileIterator'
    • parseLof

      public List<VcfLof> parseLof()
      Parse LOF from VcfEntry
    • parseNmd

      public List<VcfNmd> parseNmd()
      Parse NMD from VcfEntry
    • removeInfo

      public void removeInfo(String key)
      Remove INFO field
    • rmInfo

      public boolean rmInfo(String info)
      Parse INFO fields
    • setFilter

      public void setFilter(String filter)
    • setFormat

      public void setFormat(String format)
    • setGenotypeStr

      public void setGenotypeStr(String genotypeFieldsStr)
    • setLineNum

      public void setLineNum(int lineNum)
    • toStr

      public String toStr()
      To string as a simple "CHR:START_REF/ALTs" format
      Overrides:
      toStr in class Interval
    • toString

      public String toString()
      Overrides:
      toString in class Marker
    • toStringNoGt

      public String toStringNoGt()
      Show only first eight fields (no genotype entries)
    • uncompressGenotypes

      public VcfEntry uncompressGenotypes()
      Uncompress VCF entry having genotypes in "HO,HE,NA" fields
    • variants

      public List<Variant> variants()
      Create a list of variants from this VcfEntry