Class FastqToSam
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.sam.FastqToSam
-
@DocumentedFeature public class FastqToSam extends CommandLineProgram
Converts a FASTQ file to an unaligned BAM or SAM file.Output read records will contain the original base calls and quality scores will be translated depending on the base quality score encoding: FastqSanger, FastqSolexa and FastqIllumina.
There are also arguments to provide values for SAM header and read attributes that are not present in FASTQ (e.g see
RG
orSM
below).Inputs
One FASTQ file name for single-end or two for pair-end sequencing input data. These files might be in gzip compressed format (when file name is ending with ".gz").
Alternatively, for larger inputs you can provide a collection of FASTQ files indexed by their name (see
USE_SEQUENCIAL_FASTQ
for details below).By default, this tool will try to guess the base quality score encoding. However you can indicate it explicitly using the
QUALITY_FORMAT
argument.Output
A single unaligned BAM or SAM file. By default, the records are sorted by query (read) name.Usage examples
Example 1:
Single-end sequencing FASTQ file conversion. All reads are annotated as belonging to the "rg0013" read group that in turn is part of the sample "sample001".
java -jar picard.jar FastqToSam \ F1=input_reads.fastq \ O=unaligned_reads.bam \ SM=sample001 \ RG=rg0013
Example 2:
Similar to example 1 above, but for paired-end sequencing.
java -jar picard.jar FastqToSam \ F1=forward_reads.fastq \ F2=reverse_reads.fastq \ O=unaligned_read_pairs.bam \ SM=sample001 \ RG=rg0013
-
-
Field Summary
Fields Modifier and Type Field Description Boolean
ALLOW_AND_IGNORE_EMPTY_LINES
List<String>
COMMENT
String
DESCRIPTION
File
FASTQ
File
FASTQ2
String
LIBRARY_NAME
int
MAX_Q
int
MIN_Q
File
OUTPUT
String
PLATFORM
String
PLATFORM_MODEL
String
PLATFORM_UNIT
Integer
PREDICTED_INSERT_SIZE
String
PROGRAM_GROUP
htsjdk.samtools.util.FastqQualityFormat
QUALITY_FORMAT
String
READ_GROUP_NAME
htsjdk.samtools.util.Iso8601Date
RUN_DATE
String
SAMPLE_NAME
String
SEQUENCING_CENTER
htsjdk.samtools.SAMFileHeader.SortOrder
SORT_ORDER
Boolean
STRIP_UNPAIRED_MATE_NUMBER
Deprecated.boolean
USE_SEQUENTIAL_FASTQS
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description FastqToSam()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description htsjdk.samtools.SAMFileHeader
createSamFileHeader()
Creates a simple header with the values provided on the command line.protected String[]
customCommandLineValidation()
Put any custom command-line validation in an override of this method.static htsjdk.samtools.util.FastqQualityFormat
determineQualityFormat(htsjdk.samtools.fastq.FastqReader reader1, htsjdk.samtools.fastq.FastqReader reader2, htsjdk.samtools.util.FastqQualityFormat expectedQuality)
Looks at fastq input(s) and attempts to determine the proper quality format Closes the reader(s) by side effectprotected int
doPaired(htsjdk.samtools.fastq.FastqReader freader1, htsjdk.samtools.fastq.FastqReader freader2, htsjdk.samtools.SAMFileWriter writer)
More complicated method that takes two fastq files and builds pairing information in the SAM.protected int
doUnpaired(htsjdk.samtools.fastq.FastqReader freader, htsjdk.samtools.SAMFileWriter writer)
Creates a simple SAM file from a single fastq file.protected int
doWork()
Do the work after command line has been parsed.protected static List<File>
getSequentialFileList(File baseFastq)
Get a list of FASTQs that are sequentially numbered based on the first (base) fastq.static void
main(String[] argv)
Stock main method.void
makeItSo(htsjdk.samtools.fastq.FastqReader reader1, htsjdk.samtools.fastq.FastqReader reader2, htsjdk.samtools.SAMFileWriter writer)
Handles the FastqToSam execution on the FastqReader(s).-
Methods inherited from class picard.cmdline.CommandLineProgram
getCommandLine, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
FASTQ
@Argument(shortName="F1", doc="Input fastq file (optionally gzipped) for single end data, or first read in paired end data.") public File FASTQ
-
FASTQ2
@Argument(shortName="F2", doc="Input fastq file (optionally gzipped) for the second read of paired end data.", optional=true) public File FASTQ2
-
USE_SEQUENTIAL_FASTQS
@Argument(doc="Use sequential fastq files with the suffix <prefix>_###.fastq or <prefix>_###.fastq.gz", optional=true) public boolean USE_SEQUENTIAL_FASTQS
-
QUALITY_FORMAT
@Argument(shortName="V", doc="A value describing how the quality values are encoded in the input FASTQ file. Either Solexa (phred scaling + 66), Illumina (phred scaling + 64) or Standard (phred scaling + 33). If this value is not specified, the quality format will be detected automatically.", optional=true) public htsjdk.samtools.util.FastqQualityFormat QUALITY_FORMAT
-
OUTPUT
@Argument(doc="Output SAM/BAM file. ", shortName="O") public File OUTPUT
-
READ_GROUP_NAME
@Argument(shortName="RG", doc="Read group name") public String READ_GROUP_NAME
-
SAMPLE_NAME
@Argument(shortName="SM", doc="Sample name to insert into the read group header") public String SAMPLE_NAME
-
LIBRARY_NAME
@Argument(shortName="LB", doc="The library name to place into the LB attribute in the read group header", optional=true) public String LIBRARY_NAME
-
PLATFORM_UNIT
@Argument(shortName="PU", doc="The platform unit (often run_barcode.lane) to insert into the read group header", optional=true) public String PLATFORM_UNIT
-
PLATFORM
@Argument(shortName="PL", doc="The platform type (e.g. illumina, solid) to insert into the read group header", optional=true) public String PLATFORM
-
SEQUENCING_CENTER
@Argument(shortName="CN", doc="The sequencing center from which the data originated", optional=true) public String SEQUENCING_CENTER
-
PREDICTED_INSERT_SIZE
@Argument(shortName="PI", doc="Predicted median insert size, to insert into the read group header", optional=true) public Integer PREDICTED_INSERT_SIZE
-
PROGRAM_GROUP
@Argument(shortName="PG", doc="Program group to insert into the read group header.", optional=true) public String PROGRAM_GROUP
-
PLATFORM_MODEL
@Argument(shortName="PM", doc="Platform model to insert into the group header (free-form text providing further details of the platform/technology used)", optional=true) public String PLATFORM_MODEL
-
COMMENT
@Argument(doc="Comment(s) to include in the merged output file\'s header.", optional=true, shortName="CO") public List<String> COMMENT
-
DESCRIPTION
@Argument(shortName="DS", doc="Inserted into the read group header", optional=true) public String DESCRIPTION
-
RUN_DATE
@Argument(shortName="DT", doc="Date the run was produced, to insert into the read group header", optional=true) public htsjdk.samtools.util.Iso8601Date RUN_DATE
-
SORT_ORDER
@Argument(shortName="SO", doc="The sort order for the output sam/bam file.") public htsjdk.samtools.SAMFileHeader.SortOrder SORT_ORDER
-
MIN_Q
@Argument(doc="Minimum quality allowed in the input fastq. An exception will be thrown if a quality is less than this value.") public int MIN_Q
-
MAX_Q
@Argument(doc="Maximum quality allowed in the input fastq. An exception will be thrown if a quality is greater than this value.") public int MAX_Q
-
STRIP_UNPAIRED_MATE_NUMBER
@Deprecated @Argument(doc="Deprecated (No longer used). If true and this is an unpaired fastq any occurrence of \'/1\' or \'/2\' will be removed from the end of a read name.") public Boolean STRIP_UNPAIRED_MATE_NUMBER
Deprecated.
-
ALLOW_AND_IGNORE_EMPTY_LINES
@Argument(doc="Allow (and ignore) empty lines") public Boolean ALLOW_AND_IGNORE_EMPTY_LINES
-
-
Method Detail
-
determineQualityFormat
public static htsjdk.samtools.util.FastqQualityFormat determineQualityFormat(htsjdk.samtools.fastq.FastqReader reader1, htsjdk.samtools.fastq.FastqReader reader2, htsjdk.samtools.util.FastqQualityFormat expectedQuality)
Looks at fastq input(s) and attempts to determine the proper quality format Closes the reader(s) by side effect- Parameters:
reader1
- The first fastq inputreader2
- The second fastq input, if necessary. To not use this input, set it to nullexpectedQuality
- If provided, will be used for sanity checking. If left null, autodetection will occur
-
main
public static void main(String[] argv)
Stock main method.
-
getSequentialFileList
protected static List<File> getSequentialFileList(File baseFastq)
Get a list of FASTQs that are sequentially numbered based on the first (base) fastq. The files should be named:_001. , _002. , ..., _XYZ. The base files should be: _001. An example would be: RUNNAME_S8_L005_R1_001.fastq RUNNAME_S8_L005_R1_002.fastq RUNNAME_S8_L005_R1_003.fastq RUNNAME_S8_L005_R1_004.fastq where `baseFastq` is the first in that list.
-
doWork
protected int doWork()
Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
makeItSo
public void makeItSo(htsjdk.samtools.fastq.FastqReader reader1, htsjdk.samtools.fastq.FastqReader reader2, htsjdk.samtools.SAMFileWriter writer)
Handles the FastqToSam execution on the FastqReader(s). In some circumstances it might be useful to circumvent the command line based instantiation of this class, however note that there is no handholding or guardrails to running in this manner. It is the caller's responsibility to close the reader(s)- Parameters:
reader1
- The FastqReader for the first fastq filereader2
- The second FastqReader if applicable. Pass in null if only using a single readerwriter
- The SAMFileWriter where the new SAM file is written
-
doUnpaired
protected int doUnpaired(htsjdk.samtools.fastq.FastqReader freader, htsjdk.samtools.SAMFileWriter writer)
Creates a simple SAM file from a single fastq file.
-
doPaired
protected int doPaired(htsjdk.samtools.fastq.FastqReader freader1, htsjdk.samtools.fastq.FastqReader freader2, htsjdk.samtools.SAMFileWriter writer)
More complicated method that takes two fastq files and builds pairing information in the SAM.
-
createSamFileHeader
public htsjdk.samtools.SAMFileHeader createSamFileHeader()
Creates a simple header with the values provided on the command line.
-
customCommandLineValidation
protected String[] customCommandLineValidation()
Description copied from class:CommandLineProgram
Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.- Overrides:
customCommandLineValidation
in classCommandLineProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
-
-