Package org.biojavax.bio.seq
Class RichSequence.IOTools
java.lang.Object
org.biojavax.bio.seq.RichSequence.IOTools
- Enclosing interface:
RichSequence
A set of convenience methods for handling common file formats.
- Since:
- 1.5
- Author:
- Mark Schreiber, Richard Holland
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
Used to iterate over a single rich sequence -
Method Summary
Modifier and TypeMethodDescriptionstatic SymbolTokenization
Creates a DNA symbol tokenizer.static SymbolTokenization
Creates a nucleotide symbol tokenizer.static SymbolTokenization
Creates a protein symbol tokenizer.static SymbolTokenization
Creates a RNA symbol tokenizer.static RichSequenceIterator
readEMBL
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a EMBL file using a custom type of SymbolList.static RichSequenceIterator
readEMBLDNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an EMBL-format stream of DNA sequences.static RichSequenceIterator
readEMBLProtein
(BufferedReader br, Namespace ns) Iterate over the sequences in an EMBL-format stream of Protein sequences.static RichSequenceIterator
readEMBLRNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an EMBL-format stream of RNA sequences.static RichSequenceIterator
readEMBLxml
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a EMBLxml file using a custom type of SymbolList.static RichSequenceIterator
readEMBLxmlDNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an EMBLxml-format stream of DNA sequences.static RichSequenceIterator
Iterate over the sequences in an EMBLxml-format stream of Protein sequences.static RichSequenceIterator
readEMBLxmlRNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an EMBLxml-format stream of RNA sequences.static RichSequenceIterator
readFasta
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a fasta file building a custom type ofRichSequence
.static RichSequenceIterator
readFasta
(BufferedReader br, SymbolTokenization sTok, Namespace ns) Read a fasta file.static RichSequenceIterator
readFastaDNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an FASTA-format stream of DNA sequences.static RichSequenceIterator
readFastaProtein
(BufferedReader br, Namespace ns) Iterate over the sequences in an FASTA-format stream of Protein sequences.static RichSequenceIterator
readFastaRNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an FASTA-format stream of RNA sequences.static RichSequenceIterator
readFile
(File file, RichSequenceBuilderFactory seqFactory, Namespace ns) Guess which format a file is then attempt to read it.static RichSequenceIterator
Guess which format a file is then attempt to read it.static RichSequenceIterator
readGenbank
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a GenBank file using a custom type of SymbolList.static RichSequenceIterator
readGenbankDNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an GenBank-format stream of DNA sequences.static RichSequenceIterator
Iterate over the sequences in an GenBank-format stream of Protein sequences.static RichSequenceIterator
readGenbankRNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an GenBank-format stream of RNA sequences.static RichSequenceIterator
Iterate over the sequences in an FASTA-format stream of DNA sequences.static RichSequenceIterator
readINSDseq
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a INSDseq file using a custom type of SymbolList.static RichSequenceIterator
readINSDseqDNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an INSDseq-format stream of DNA sequences.static RichSequenceIterator
Iterate over the sequences in an INSDseq-format stream of Protein sequences.static RichSequenceIterator
readINSDseqRNA
(BufferedReader br, Namespace ns) Iterate over the sequences in an INSDseq-format stream of RNA sequences.static RichSequenceIterator
readStream
(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns) Guess which format a stream is then attempt to read it.static RichSequenceIterator
readStream
(BufferedInputStream stream, Namespace ns) Guess which format a stream is then attempt to read it.static RichSequenceIterator
readUniProt
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a UniProt file using a custom type of SymbolList.static RichSequenceIterator
readUniProt
(BufferedReader br, Namespace ns) Iterate over the sequences in an UniProt-format stream of RNA sequences.static RichSequenceIterator
readUniProtXML
(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a UniProt XML file using a custom type of SymbolList.static RichSequenceIterator
readUniProtXML
(BufferedReader br, Namespace ns) Iterate over the sequences in an UniProt XML-format stream of RNA sequences.static void
registerFormat
(Class formatClass) Register a new format with IOTools for auto-guessing.static void
writeEMBL
(OutputStream os, SequenceIterator in, Namespace ns) Writes sequences from aSequenceIterator
to anOutputStream
in EMBL Format.static void
writeEMBL
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in EMBL format.static void
writeEMBLxml
(OutputStream os, SequenceIterator in, Namespace ns) Writes sequences from aSequenceIterator
to anOutputStream
in EMBLxml Format.static void
writeEMBLxml
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in EMBLxml format.static void
writeFasta
(OutputStream os, SequenceIterator in, Namespace ns) WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format.static void
writeFasta
(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header) WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format.static void
writeFasta
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in Fasta format.static void
writeFasta
(OutputStream os, Sequence seq, Namespace ns, FastaHeader header) Writes a singleSequence
to anOutputStream
in Fasta format.static void
writeGenbank
(OutputStream os, SequenceIterator in, Namespace ns) Writes sequences from aSequenceIterator
to anOutputStream
in GenBank Format.static void
writeGenbank
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in GenBank format.static void
writeINSDseq
(OutputStream os, SequenceIterator in, Namespace ns) Writes sequences from aSequenceIterator
to anOutputStream
in INSDseq Format.static void
writeINSDseq
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in INSDseq format.static void
writeUniProt
(OutputStream os, SequenceIterator in, Namespace ns) Writes sequences from aSequenceIterator
to anOutputStream
in UniProt Format.static void
writeUniProt
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in UniProt format.static void
writeUniProtXML
(OutputStream os, SequenceIterator in, Namespace ns) Writes sequences from aSequenceIterator
to anOutputStream
in UniProt XML Format.static void
writeUniProtXML
(OutputStream os, Sequence seq, Namespace ns) Writes a singleSequence
to anOutputStream
in UniProt XML format.
-
Method Details
-
registerFormat
Register a new format with IOTools for auto-guessing.- Parameters:
formatClass
- theRichSequenceFormat
object to register.
-
readStream
public static RichSequenceIterator readStream(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns) throws IOException Guess which format a stream is then attempt to read it.- Parameters:
stream
- theBufferedInputStream
to attempt to read.seqFactory
- a factory used to build aRichSequence
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- in case the stream is unrecognisable or problems occur in reading it.
-
readStream
public static RichSequenceIterator readStream(BufferedInputStream stream, Namespace ns) throws IOException Guess which format a stream is then attempt to read it.- Parameters:
stream
- theBufferedInputStream
to attempt to read.ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- If the file cannot be read.
-
readFile
public static RichSequenceIterator readFile(File file, RichSequenceBuilderFactory seqFactory, Namespace ns) throws IOException Guess which format a file is then attempt to read it.- Parameters:
file
- theFile
to attempt to read.seqFactory
- a factory used to build aRichSequence
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- in case the file is unrecognisable or problems occur in reading it.
-
readFile
Guess which format a file is then attempt to read it.- Parameters:
file
- theFile
to attempt to read.ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the file - Throws:
IOException
- If the file cannot be read.
-
readFasta
public static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, Namespace ns) Read a fasta file.- Parameters:
br
- theBufferedReader
to read data from
sTok
- aSymbolTokenization
that understands the sequencesns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readFasta
public static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a fasta file building a custom type ofRichSequence
. For example, useRichSequenceBuilderFactory.FACTORY
to emulatereadFasta(BufferedReader, SymbolTokenization)
andRichSequenceBuilderFactory.PACKED
to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aRichSequence
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readFastaDNA
Iterate over the sequences in an FASTA-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file - See Also:
-
readHashedFastaDNA
public static RichSequenceIterator readHashedFastaDNA(BufferedInputStream is, Namespace ns) throws BioException Iterate over the sequences in an FASTA-format stream of DNA sequences. In contrast to readFastaDNA, this provides a speeded up implementation where all sequences are accessed from memory.- Parameters:
is
- theBufferedInputStream
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file - Throws:
BioException
- if somethings goes wrong while reading the file.- See Also:
-
readFastaRNA
Iterate over the sequences in an FASTA-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readFastaProtein
Iterate over the sequences in an FASTA-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbank
public static RichSequenceIterator readGenbank(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a GenBank file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbankDNA
Iterate over the sequences in an GenBank-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbankRNA
Iterate over the sequences in an GenBank-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readGenbankProtein
Iterate over the sequences in an GenBank-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseq
public static RichSequenceIterator readINSDseq(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a INSDseq file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseqDNA
Iterate over the sequences in an INSDseq-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseqRNA
Iterate over the sequences in an INSDseq-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readINSDseqProtein
Iterate over the sequences in an INSDseq-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxml
public static RichSequenceIterator readEMBLxml(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a EMBLxml file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxmlDNA
Iterate over the sequences in an EMBLxml-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxmlRNA
Iterate over the sequences in an EMBLxml-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLxmlProtein
Iterate over the sequences in an EMBLxml-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBL
public static RichSequenceIterator readEMBL(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a EMBL file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLDNA
Iterate over the sequences in an EMBL-format stream of DNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLRNA
Iterate over the sequences in an EMBL-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readEMBLProtein
Iterate over the sequences in an EMBL-format stream of Protein sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProt
public static RichSequenceIterator readUniProt(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a UniProt file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProt
Iterate over the sequences in an UniProt-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProtXML
public static RichSequenceIterator readUniProtXML(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns) Read a UniProt XML file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.- Parameters:
br
- theBufferedReader
to read data fromsTok
- aSymbolTokenization
that understands the sequencesseqFactory
- a factory used to build aSymbolList
ns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
readUniProtXML
Iterate over the sequences in an UniProt XML-format stream of RNA sequences.- Parameters:
br
- theBufferedReader
to read data fromns
- aNamespace
to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, thenRichObjectFactory.getDefaultNamespace()
is used.- Returns:
- a
RichSequenceIterator
over each sequence in the fasta file
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header) throws IOException WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of inputRichSequence
sns
- aNamespace
to write theRichSequence
s to.Null
implies that it should use the namespace specified in the individual sequence.header
- the FastaHeader- Throws:
IOException
- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns) throws IOException WritesSequence
s from aSequenceIterator
to anOutputStream
in Fasta Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of inputRichSequence
sns
- aNamespace
to write theRichSequence
s to.Null
implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeFasta
Writes a singleSequence
to anOutputStream
in Fasta format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeFasta
public static void writeFasta(OutputStream os, Sequence seq, Namespace ns, FastaHeader header) throws IOException Writes a singleSequence
to anOutputStream
in Fasta format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.header
- aFastaHeader
that controls the fields in the header.- Throws:
IOException
- if there is an IO problem
-
writeGenbank
public static void writeGenbank(OutputStream os, SequenceIterator in, Namespace ns) throws IOException Writes sequences from aSequenceIterator
to anOutputStream
in GenBank Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeGenbank
Writes a singleSequence
to anOutputStream
in GenBank format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeINSDseq
public static void writeINSDseq(OutputStream os, SequenceIterator in, Namespace ns) throws IOException Writes sequences from aSequenceIterator
to anOutputStream
in INSDseq Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeINSDseq
Writes a singleSequence
to anOutputStream
in INSDseq format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBLxml
public static void writeEMBLxml(OutputStream os, SequenceIterator in, Namespace ns) throws IOException Writes sequences from aSequenceIterator
to anOutputStream
in EMBLxml Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBLxml
Writes a singleSequence
to anOutputStream
in EMBLxml format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBL
Writes sequences from aSequenceIterator
to anOutputStream
in EMBL Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeEMBL
Writes a singleSequence
to anOutputStream
in EMBL format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProt
public static void writeUniProt(OutputStream os, SequenceIterator in, Namespace ns) throws IOException Writes sequences from aSequenceIterator
to anOutputStream
in UniProt Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProt
Writes a singleSequence
to anOutputStream
in UniProt format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProtXML
public static void writeUniProtXML(OutputStream os, SequenceIterator in, Namespace ns) throws IOException Writes sequences from aSequenceIterator
to anOutputStream
in UniProt XML Format. This makes for a useful format filter where aStreamReader
can be sent to theRichStreamWriter
after formatting.- Parameters:
os
- The stream to write fasta formatted data toin
- The source of input Sequencesns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
writeUniProtXML
Writes a singleSequence
to anOutputStream
in UniProt XML format.- Parameters:
os
- theOutputStream
.seq
- theSequence
.ns
- aNamespace
to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.- Throws:
IOException
- if there is an IO problem
-
getDNAParser
Creates a DNA symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing DNA.
-
getRNAParser
Creates a RNA symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing RNA.
-
getNucleotideParser
Creates a nucleotide symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing nucleotides.
-
getProteinParser
Creates a protein symbol tokenizer.- Returns:
- a
SymbolTokenization
for parsing protein.
-