Package org.snpeff.vcf
Class VcfEntry
java.lang.Object
org.snpeff.interval.Interval
org.snpeff.interval.Marker
org.snpeff.vcf.VcfEntry
- All Implemented Interfaces:
Serializable
,Cloneable
,Comparable<Interval>
,Iterable<VcfGenotype>
,TxtSerializable
A VCF entry is a line in a VCF file
A VCF line can have multiple variants, and multiple genotypes
- Author:
- pablocingolani
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final double
static final double
protected String[]
protected String
protected String
static final String[]
protected String
static final String
protected String
protected String[]
protected String[]
protected String
protected byte[]
static final Pattern
protected String
protected String
protected int
protected Double
protected String
static final String
protected LinkedList
<Variant> static final String
static final String[]
static final String
static final String[]
static final String
static final String[]
static final String
static final String
static final String
static final String
static final String
protected VcfFileIterator
protected ArrayList
<VcfGenotype> static final char
Fields inherited from class org.snpeff.interval.Interval
chromosomeNameOri, end, id, parent, start, strandMinus
-
Constructor Summary
ConstructorsConstructorDescriptionVcfEntry
(VcfFileIterator vcfFileIterator, String line, int lineNum, boolean parseNow) Create a line form a file iteratorVcfEntry
(VcfFileIterator vcfFileIterator, Marker parent, String chromosomeName, int start, String id, String ref, String altsStr, double quality, String filterPass, String infoStr, String format) -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add string to FILTER fieldvoid
Add a 'FORMAT' fieldvoid
addGenotype
(String vcfGenotypeStr) Add a genotype as a stringvoid
Add a "key=value" tuple the info fieldCategorization by allele frequencyIs this entry heterozygous? Infer Hom/Her if there is only one sample in the file.check()
Perform several simple checks and report problems (if any).static String
Return a string without leading, trailing and duplicated underscoresPerform a shallow cloneboolean
Compress genotypes into "HO/HE/NA" INFO fieldsboolean
Remove a string from FILTER fieldint
getAltIndex
(String alt) Get index of matching ALT entryString[]
getAlts()
Create a comma separated ALTS stringOriginal chromosome name (as it appeared in the VCF file)String[]
byte[]
Return genotypes parsed as an array of codesGet info stringGet info string for a specific alleleGet an INFO field matching a variantboolean
getInfoFlag
(String key) Does the entry exists?double
getInfoFloat
(String key) Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitivelong
getInfoInt
(String key) Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitiveGet all keys available in the info fieldGet the full (unparsed) INFO fieldgetLine()
Original VCF line (from file)int
int
number of samples in this VCF filedouble
getRef()
getStr()
getVcfEffects
(EffFormatVersion formatVersion) Parse 'EFF' info field and get a list of effectsgetVcfGenotype
(int index) getVcfInfo
(String id) Get VcfInfo type for a given IDGet Info number for a given IDboolean
boolean
boolean
boolean
boolean
Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.boolean
Do we have compressed genotypes in "HO,HE,NA" INFO fields?static boolean
Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)boolean
boolean
Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.protected boolean
Show an error if parent does not include child?boolean
Is thins a VCF entry with a single SNP?boolean
Is this variant a singleton (appears only in one genotype)static boolean
isValidInfoKey
(String key) Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)static boolean
isValidInfoValue
(String value) Check that this value can be added to an INFO fieldboolean
Is this a change or are the ALTs actually the same as the referenceboolean
Is this ALT string a variant?iterator()
int
mac()
Calculate Minor allele countdouble
maf()
Calculate Minor allele frequencyvoid
parse()
Parse a 'line' from a 'vcfFileIterator'parseLof()
Parse LOF from VcfEntryparseNmd()
Parse NMD from VcfEntryvoid
removeInfo
(String key) Remove INFO fieldboolean
Parse INFO fieldsvoid
void
void
setGenotypeStr
(String genotypeFieldsStr) void
setLineNum
(int lineNum) toStr()
To string as a simple "CHR:START_REF/ALTs" formattoString()
Show only first eight fields (no genotype entries)Uncompress VCF entry having genotypes in "HO,HE,NA" fieldsvariants()
Create a list of variants from this VcfEntrystatic String
vcfInfoDecode
(String str) Decode INFO valuestatic String
vcfInfoEncode
(String str) Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TABstatic String
vcfInfoKeySafe
(String str) Return a string safe to be used in an 'INFO' field keystatic String
vcfInfoValueSafe
(String str) Return a string safe to be used in an 'INFO' field valueMethods inherited from class org.snpeff.interval.Marker
adjust, apply, applyDel, applyDup, applyIns, applyMixed, clone, codonTable, compareTo, compareToPos, distance, distanceBases, getParent, getType, idChain, idChain, idChain, includes, intersect, isAdjustIfParentDoesNotInclude, isDeferredAnalysis, minus, query, query, readTxt, serializeParse, serializeSave, shouldApply, union, variantEffect, variantEffectNonRef
Methods inherited from class org.snpeff.interval.Interval
equals, findParent, getChromosome, getChromosomeName, getChromosomeNum, getEnd, getGenome, getGenomeName, getId, getStart, getStrand, hashCode, intersects, intersects, intersects, intersects, intersectSize, isCircular, isSameChromo, isStrandMinus, isStrandPlus, isValid, setChromosomeNameOri, setEnd, setId, setParent, setStart, setStrandMinus, shiftCoordinates, size, toStringAsciiArt, toStrPos
Methods inherited from class java.lang.Object
equals, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
FILTER_PASS
- See Also:
-
WITHIN_FIELD_SEP
public static final char WITHIN_FIELD_SEP- See Also:
-
SUB_FIELD_SEP
- See Also:
-
EMPTY_STRING_ARRAY
-
ALLELE_FEQUENCY_COMMON
public static final double ALLELE_FEQUENCY_COMMON- See Also:
-
ALLELE_FEQUENCY_LOW
public static final double ALLELE_FEQUENCY_LOW- See Also:
-
INFO_KEY_PATTERN
-
VCF_INFO_END
- See Also:
-
VCF_ALT_NON_REF
- See Also:
-
VCF_ALT_NON_REF_gVCF
- See Also:
-
VCF_ALT_MISSING_REF
- See Also:
-
VCF_ALT_NON_REF_gVCF_ARRAY
-
VCF_ALT_NON_REF_ARRAY
-
VCF_ALT_MISSING_REF_ARRAY
-
VCF_INFO_HOMS
- See Also:
-
VCF_INFO_HETS
- See Also:
-
VCF_INFO_NAS
- See Also:
-
VCF_INFO_PRIVATE
- See Also:
-
alts
-
altStr
-
chromosomeName
-
filter
-
format
-
formatFields
-
genotypeFields
-
genotypeFieldsStr
-
genotypeScores
protected byte[] genotypeScores -
info
-
infoStr
-
line
-
lineNum
protected int lineNum -
quality
-
ref
-
variants
-
vcfEffects
-
vcfFileIterator
-
vcfGenotypes
-
-
Constructor Details
-
VcfEntry
-
VcfEntry
Create a line form a file iterator
-
-
Method Details
-
cleanUnderscores
Return a string without leading, trailing and duplicated underscores -
isEmpty
Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values) -
isValidInfoKey
Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3) -
isValidInfoValue
Check that this value can be added to an INFO field- Returns:
- true if OK, false if invalid value
-
vcfInfoDecode
Decode INFO value -
vcfInfoEncode
Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TAB -
vcfInfoKeySafe
Return a string safe to be used in an 'INFO' field key -
vcfInfoValueSafe
Return a string safe to be used in an 'INFO' field value -
addFilter
Add string to FILTER field -
addFormat
Add a 'FORMAT' field -
addGenotype
Add a genotype as a string -
addInfo
Add a "key=value" tuple the info field- Parameters:
key
- : INFO key namevalue
- : Can be null if it is a boolean field.
-
alleleFrequencyType
Categorization by allele frequency -
calcHetero
Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file. Ohtherwise the field is null. -
check
Perform several simple checks and report problems (if any). -
cloneShallow
Description copied from class:Marker
Perform a shallow clone- Overrides:
cloneShallow
in classMarker
-
compressGenotypes
public boolean compressGenotypes()Compress genotypes into "HO/HE/NA" INFO fields -
delFilter
Remove a string from FILTER field -
getAltIndex
Get index of matching ALT entry- Returns:
- -1 if not found
-
getAlts
-
getAltsStr
Create a comma separated ALTS string -
getChromosomeNameOri
Original chromosome name (as it appeared in the VCF file)- Overrides:
getChromosomeNameOri
in classInterval
-
getFilter
-
getFormat
-
getFormatFields
-
getGenotypesScores
public byte[] getGenotypesScores()Return genotypes parsed as an array of codes -
getInfo
Get info string -
getInfo
Get info string for a specific allele -
getInfo
Get an INFO field matching a variant -
getInfoFlag
Does the entry exists? -
getInfoFloat
Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitive -
getInfoInt
Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitive -
getInfoKeys
Get all keys available in the info field -
getInfoStr
Get the full (unparsed) INFO field -
getLine
Original VCF line (from file) -
getLineNum
public int getLineNum() -
getNumberOfSamples
public int getNumberOfSamples()number of samples in this VCF file -
getQuality
public double getQuality() -
getRef
-
getStr
-
getVcfEffects
-
getVcfEffects
Parse 'EFF' info field and get a list of effects -
getVcfFileIterator
-
getVcfGenotype
-
getVcfGenotypes
-
getVcfInfo
Get VcfInfo type for a given ID -
getVcfInfoNumber
Get Info number for a given ID -
hasField
-
hasGenotypes
public boolean hasGenotypes() -
hasInfo
-
hasQuality
public boolean hasQuality() -
isBiAllelic
public boolean isBiAllelic()Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation. -
isCompressedGenotypes
public boolean isCompressedGenotypes()Do we have compressed genotypes in "HO,HE,NA" INFO fields? -
isFilterPass
public boolean isFilterPass() -
isMultiallelic
public boolean isMultiallelic()Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation. -
isShowWarningIfParentDoesNotInclude
protected boolean isShowWarningIfParentDoesNotInclude()Description copied from class:Marker
Show an error if parent does not include child?- Overrides:
isShowWarningIfParentDoesNotInclude
in classMarker
-
isSingleSnp
public boolean isSingleSnp()Is thins a VCF entry with a single SNP? -
isSingleton
public boolean isSingleton()Is this variant a singleton (appears only in one genotype) -
isVariant
public boolean isVariant()Is this a change or are the ALTs actually the same as the reference -
isVariant
Is this ALT string a variant? -
iterator
- Specified by:
iterator
in interfaceIterable<VcfGenotype>
-
mac
public int mac()Calculate Minor allele count -
maf
public double maf()Calculate Minor allele frequency -
parse
public void parse()Parse a 'line' from a 'vcfFileIterator' -
parseLof
Parse LOF from VcfEntry -
parseNmd
Parse NMD from VcfEntry -
removeInfo
Remove INFO field -
rmInfo
Parse INFO fields -
setFilter
-
setFormat
-
setGenotypeStr
-
setLineNum
public void setLineNum(int lineNum) -
toStr
To string as a simple "CHR:START_REF/ALTs" format -
toString
-
toStringNoGt
Show only first eight fields (no genotype entries) -
uncompressGenotypes
Uncompress VCF entry having genotypes in "HO,HE,NA" fields -
variants
Create a list of variants from this VcfEntry
-