Package picard.fingerprint
Class FingerprintChecker
- java.lang.Object
-
- picard.fingerprint.FingerprintChecker
-
public class FingerprintChecker extends Object
Major class that coordinates the activities involved in comparing genetic fingerprint data whether the source is from a genotyping platform or derived from sequence data.
-
-
Field Summary
Fields Modifier and Type Field Description static double
DEFAULT_GENOTYPING_ERROR_RATE
static int
DEFAULT_MAXIMAL_PL_DIFFERENCE
static int
DEFAULT_MINIMUM_BASE_QUALITY
static int
DEFAULT_MINIMUM_MAPPING_QUALITY
-
Constructor Summary
Constructors Constructor Description FingerprintChecker(File haplotypeData)
Creates a fingerprint checker that will work with the set of haplotypes stored in the supplied file.FingerprintChecker(HaplotypeMap haplotypes)
Creates a fingerprint checker that will work with the set of haplotyped provided.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static MatchResults
calculateMatchResults(Fingerprint observedFp, Fingerprint expectedFp)
Compares two fingerprints and calculates a MatchResults object which contains detailed information about the match (or mismatch) between fingerprints including the LOD score for whether or not the two are likely from the same sample.static MatchResults
calculateMatchResults(Fingerprint observedFp, Fingerprint expectedFp, double minPExpected, double pLoH)
static MatchResults
calculateMatchResults(Fingerprint observedFp, Fingerprint expectedFp, double minPExpected, double pLoH, boolean calculateLocusInfo, boolean calculateTumorAwareLod)
Compares two fingerprints and calculates a MatchResults object which contains detailed information about the match (or mismatch) between fingerprints including the LOD score for whether or not the two are likely from the same sample.List<FingerprintResults>
checkFingerprints(List<Path> samFiles, List<Path> genotypeFiles, String specificSample, boolean ignoreReadGroups)
Top level method to take a set of one or more SAM files and one or more Genotype files and compare each read group in each SAM file to each set of fingerprint genotypes.List<FingerprintResults>
checkFingerprintsFromPaths(List<Path> observedGenotypeFiles, List<Path> expectedGenotypeFiles, String observedSample, String expectedSample)
Top level method to take a set of one or more observed genotype (VCF) files and one or more expected genotype (VCF) files and compare one or more sample in the observed genotype file with one or more in the expected file and generate results for each set.Map<FingerprintIdDetails,Fingerprint>
fingerprintFiles(Collection<Path> files, int threads, int waitTime, TimeUnit waitTimeUnit)
Fingerprints one or more SAM/BAM/VCF files at all available loci within the haplotype map, using multiple threads to speed up the processing.Map<FingerprintIdDetails,Fingerprint>
fingerprintSamFile(Path samFile, htsjdk.samtools.util.IntervalList loci)
Generates a Fingerprint per read group in the supplied SAM file using the loci provided in the interval list.Map<FingerprintIdDetails,Fingerprint>
fingerprintVcf(Path vcfFile)
htsjdk.samtools.SAMFileHeader
getHeader()
htsjdk.samtools.util.IntervalList
getLociToGenotype(Collection<Fingerprint> fingerprints)
Takes a set of fingerprints and returns an IntervalList containing all the loci that can be productively examined in sequencing data to compare to one or more of the fingerprints.htsjdk.samtools.ValidationStringency
getValidationStringency()
Map<String,Fingerprint>
identifyContaminant(Path samFile, double contamination, int locusMaxReads)
Generates a per-sample Fingerprint for the contaminant in the supplied SAM file.Map<String,Fingerprint>
loadFingerprints(Path fingerprintFile, String specificSample)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.Map<String,Fingerprint>
loadFingerprintsFromIndexedVcf(Path fingerprintFile, String specificSample)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.Map<String,Fingerprint>
loadFingerprintsFromNonIndexedVcf(Path fingerprintFile, String specificSample)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.Map<String,Fingerprint>
loadFingerprintsFromQueriableReader(htsjdk.variant.vcf.VCFFileReader reader, String specificSample, Path source)
Loads genotypes from the supplied reader into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.Map<String,Fingerprint>
loadFingerprintsFromVariantContexts(Iterable<htsjdk.variant.variantcontext.VariantContext> iterable, String specificSample, Path source)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.protected static <T> List<T>
randomSublist(List<T> list, int n)
A small utility function to choose n random elements (un-shuffled) from a listvoid
setAllowDuplicateReads(boolean allowDuplicateReads)
Sets whether duplicate reads should be allowed when calling genotypes from SAM files.void
setGenotypingErrorRate(double genotypingErrorRate)
Sets the assumed genotyping error rate used when accurate error rates are not available.void
setmaximalPLDifference(int maximalPLDifference)
Sets the maximal difference in PL scores considered when reading PLs from a VCF.void
setMinimumBaseQuality(int minimumBaseQuality)
Sets the minimum base quality for bases used when computing a fingerprint from sequence data.void
setMinimumMappingQuality(int minimumMappingQuality)
Sets the minimum mapping quality for reads used when computing fingerprints from sequence data.void
setpLossofHet(double pLossofHet)
void
setValidationStringency(htsjdk.samtools.ValidationStringency validationStringency)
-
-
-
Field Detail
-
DEFAULT_GENOTYPING_ERROR_RATE
public static final double DEFAULT_GENOTYPING_ERROR_RATE
- See Also:
- Constant Field Values
-
DEFAULT_MINIMUM_MAPPING_QUALITY
public static final int DEFAULT_MINIMUM_MAPPING_QUALITY
- See Also:
- Constant Field Values
-
DEFAULT_MINIMUM_BASE_QUALITY
public static final int DEFAULT_MINIMUM_BASE_QUALITY
- See Also:
- Constant Field Values
-
DEFAULT_MAXIMAL_PL_DIFFERENCE
public static final int DEFAULT_MAXIMAL_PL_DIFFERENCE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
FingerprintChecker
public FingerprintChecker(File haplotypeData)
Creates a fingerprint checker that will work with the set of haplotypes stored in the supplied file.
-
FingerprintChecker
public FingerprintChecker(HaplotypeMap haplotypes)
Creates a fingerprint checker that will work with the set of haplotyped provided.
-
-
Method Detail
-
getValidationStringency
public htsjdk.samtools.ValidationStringency getValidationStringency()
-
setValidationStringency
public void setValidationStringency(htsjdk.samtools.ValidationStringency validationStringency)
-
setMinimumBaseQuality
public void setMinimumBaseQuality(int minimumBaseQuality)
Sets the minimum base quality for bases used when computing a fingerprint from sequence data.
-
setMinimumMappingQuality
public void setMinimumMappingQuality(int minimumMappingQuality)
Sets the minimum mapping quality for reads used when computing fingerprints from sequence data.
-
setGenotypingErrorRate
public void setGenotypingErrorRate(double genotypingErrorRate)
Sets the assumed genotyping error rate used when accurate error rates are not available.
-
setmaximalPLDifference
public void setmaximalPLDifference(int maximalPLDifference)
Sets the maximal difference in PL scores considered when reading PLs from a VCF.
-
getHeader
public htsjdk.samtools.SAMFileHeader getHeader()
-
setAllowDuplicateReads
public void setAllowDuplicateReads(boolean allowDuplicateReads)
Sets whether duplicate reads should be allowed when calling genotypes from SAM files. This is useful when comparing read groups within a SAM file and individual read groups show artifactually high duplication (e.g. a single-ended read group mixed in with paired-end read groups).- Parameters:
allowDuplicateReads
- should fingerprinting use duplicate reads?
-
setpLossofHet
public void setpLossofHet(double pLossofHet)
-
loadFingerprints
public Map<String,Fingerprint> loadFingerprints(Path fingerprintFile, String specificSample)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.- Parameters:
fingerprintFile
- - VCF file containing genotypes for one or more samplesspecificSample
- - null to load genotypes for all samples contained in the file or the name of an individual sample to load (and exclude all others).- Returns:
- a Map of Sample name to Fingerprint
-
loadFingerprintsFromNonIndexedVcf
public Map<String,Fingerprint> loadFingerprintsFromNonIndexedVcf(Path fingerprintFile, String specificSample)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.- Parameters:
fingerprintFile
- - VCF file containing genotypes for one or more samplesspecificSample
- - null to load genotypes for all samples contained in the file or the name of an individual sample to load (and exclude all others).- Returns:
- a Map of Sample name to Fingerprint
-
loadFingerprintsFromVariantContexts
public Map<String,Fingerprint> loadFingerprintsFromVariantContexts(Iterable<htsjdk.variant.variantcontext.VariantContext> iterable, String specificSample, Path source)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.- Parameters:
iterable
- - an iterable over variantContexts containing genotypes for one or more samplesspecificSample
- - null to load genotypes for all samples contained in the file or the name of an individual sample to load (and exclude all others).source
- The path of the source file used. used to emit errors, and annotate the fingerprints.- Returns:
- a Map of Sample name to Fingerprint
-
loadFingerprintsFromIndexedVcf
public Map<String,Fingerprint> loadFingerprintsFromIndexedVcf(Path fingerprintFile, String specificSample)
Loads genotypes from the supplied file into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.- Parameters:
fingerprintFile
- - VCF file containing genotypes for one or more samplesspecificSample
- - null to load genotypes for all samples contained in the file or the name of an individual sample to load (and exclude all others).- Returns:
- a Map of Sample name to Fingerprint
-
loadFingerprintsFromQueriableReader
public Map<String,Fingerprint> loadFingerprintsFromQueriableReader(htsjdk.variant.vcf.VCFFileReader reader, String specificSample, Path source)
Loads genotypes from the supplied reader into one or more Fingerprint objects and returns them in a Map of Sample->Fingerprint.- Parameters:
reader
- - VCF reader containing genotypes for one or more samplesspecificSample
- - null to load genotypes for all samples contained in the file or the name of an individual sample to load (and exclude all others).source
- The path of the source file used. used to emit errors.- Returns:
- a Map of Sample name to Fingerprint
-
getLociToGenotype
public htsjdk.samtools.util.IntervalList getLociToGenotype(Collection<Fingerprint> fingerprints)
Takes a set of fingerprints and returns an IntervalList containing all the loci that can be productively examined in sequencing data to compare to one or more of the fingerprints.
-
fingerprintVcf
public Map<FingerprintIdDetails,Fingerprint> fingerprintVcf(Path vcfFile)
-
fingerprintSamFile
public Map<FingerprintIdDetails,Fingerprint> fingerprintSamFile(Path samFile, htsjdk.samtools.util.IntervalList loci)
Generates a Fingerprint per read group in the supplied SAM file using the loci provided in the interval list.
-
identifyContaminant
public Map<String,Fingerprint> identifyContaminant(Path samFile, double contamination, int locusMaxReads)
Generates a per-sample Fingerprint for the contaminant in the supplied SAM file. Data is aggregated by sample, not read-group.
-
randomSublist
protected static <T> List<T> randomSublist(List<T> list, int n)
A small utility function to choose n random elements (un-shuffled) from a list- Parameters:
list
- A list of elementsn
- a number of elements requested from list- Returns:
- a list of n randomly chosen (but in the original order) elements from list. If the list has less than n elements it is returned in its entirety.
-
fingerprintFiles
public Map<FingerprintIdDetails,Fingerprint> fingerprintFiles(Collection<Path> files, int threads, int waitTime, TimeUnit waitTimeUnit)
Fingerprints one or more SAM/BAM/VCF files at all available loci within the haplotype map, using multiple threads to speed up the processing.
-
checkFingerprints
public List<FingerprintResults> checkFingerprints(List<Path> samFiles, List<Path> genotypeFiles, String specificSample, boolean ignoreReadGroups)
Top level method to take a set of one or more SAM files and one or more Genotype files and compare each read group in each SAM file to each set of fingerprint genotypes.- Parameters:
samFiles
- the list of SAM files to fingerprintgenotypeFiles
- the list of genotype files from which to pull fingerprint genotypesspecificSample
- an optional single sample who's genotypes to load from the supplied filesignoreReadGroups
- aggregate data into one fingerprint per file, instead of splitting by RG
-
checkFingerprintsFromPaths
public List<FingerprintResults> checkFingerprintsFromPaths(List<Path> observedGenotypeFiles, List<Path> expectedGenotypeFiles, String observedSample, String expectedSample)
Top level method to take a set of one or more observed genotype (VCF) files and one or more expected genotype (VCF) files and compare one or more sample in the observed genotype file with one or more in the expected file and generate results for each set.- Parameters:
observedGenotypeFiles
- The list of genotype files containing observed calls, from which to pull fingerprint genotypesexpectedGenotypeFiles
- The list of genotype files containing expected calls, from which to pull fingerprint genotypesobservedSample
- an optional single sample whose genotypes to load from the observed genotype file (if null, use all)expectedSample
- an optional single sample whose genotypes to load from the expected genotype file (if null, use all)
-
calculateMatchResults
public static MatchResults calculateMatchResults(Fingerprint observedFp, Fingerprint expectedFp, double minPExpected, double pLoH)
-
calculateMatchResults
public static MatchResults calculateMatchResults(Fingerprint observedFp, Fingerprint expectedFp, double minPExpected, double pLoH, boolean calculateLocusInfo, boolean calculateTumorAwareLod)
Compares two fingerprints and calculates a MatchResults object which contains detailed information about the match (or mismatch) between fingerprints including the LOD score for whether or not the two are likely from the same sample.If comparing sequencing data to genotype data then the sequencing data should be passed as the observedFp and the genotype data as the expectedFp in order to get the best output.
In the cases where the most likely genotypes from the two fingerprints do not match the lExpectedSample is Max(actualpExpectedSample, minPExpected).
-
calculateMatchResults
public static MatchResults calculateMatchResults(Fingerprint observedFp, Fingerprint expectedFp)
Compares two fingerprints and calculates a MatchResults object which contains detailed information about the match (or mismatch) between fingerprints including the LOD score for whether or not the two are likely from the same sample.If comparing sequencing data to genotype data then the sequencing data should be passed as the observedFp and the genotype data as the expectedFp in order to get the best output.
-
-