Class SeqIOTools

java.lang.Object
org.biojava.bio.seq.io.SeqIOTools

public final class SeqIOTools extends Object
Deprecated.
use org.biojavax.bio.seq.RichSequence.IOTools
A set of convenience methods for handling common file formats.
Since:
1.1
Author:
Thomas Down, Mark Schreiber, Nimesh Singh, Matthew Pocock, Keith James
  • Method Details

    • getEmblBuilderFactory

      Deprecated.
      Get a default SequenceBuilderFactory for handling EMBL files.
      Returns:
      a SmartSequenceBuilder.FACTORY
    • readEmbl

      Deprecated.
      Iterate over the sequences in an EMBL-format stream.
      Parameters:
      br - A reader for the EMBL source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readEmblRNA

      Deprecated.
      Iterate over the sequences in an EMBL-format stream, but for RNA.
      Parameters:
      br - A reader for the EMBL source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readEmblNucleotide

      Deprecated.
      Iterate over the sequences in an EMBL-format stream.
      Parameters:
      br - A reader for the EMBL source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • getGenbankBuilderFactory

      Deprecated.
      Get a default SequenceBuilderFactory for handling GenBank files.
      Returns:
      a SmartSequenceBuilder.FACTORY
    • readGenbank

      Deprecated.
      Iterate over the sequences in an Genbank-format stream.
      Parameters:
      br - A reader for the Genbank source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readGenbankXml

      Deprecated.
      Iterate over the sequences in an GenbankXML-format stream.
      Parameters:
      br - A reader for the GenbanXML source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • getGenpeptBuilderFactory

      Deprecated.
      Get a default SequenceBuilderFactory for handling Genpept files.
      Returns:
      a SmartSequenceBuilder.FACTORY
    • readGenpept

      Deprecated.
      Iterate over the sequences in an Genpept-format stream.
      Parameters:
      br - A reader for the Genpept source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • getSwissprotBuilderFactory

      Deprecated.
      Get a default SequenceBuilderFactory for handling Swissprot files.
      Returns:
      a SmartSequenceBuilder.FACTORY
    • readSwissprot

      Deprecated.
      Iterate over the sequences in an Swissprot-format stream.
      Parameters:
      br - A reader for the Swissprot source or file
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • getFastaBuilderFactory

      Deprecated.
      Get a default SequenceBuilderFactory for handling FASTA files.
      Returns:
      a SmartSequenceBuilder.FACTORY
    • readFasta

      Deprecated.
      Read a fasta file.
      Parameters:
      br - the BufferedReader to read data from
      sTok - a SymbolTokenization that understands the sequences
      Returns:
      a SequenceIterator over each sequence in the fasta file
    • readFasta

      Deprecated.
      Read a fasta file using a custom type of SymbolList. For example, use SmartSequenceBuilder.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and SmartSequenceBuilder.BIT_PACKED to force all symbols to be encoded using bit-packing.
      Parameters:
      br - the BufferedReader to read data from
      sTok - a SymbolTokenization that understands the sequences
      seqFactory - a factory used to build a SymbolList
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readFastaDNA

      Deprecated.
      Iterate over the sequences in an FASTA-format stream of DNA sequences.
      Parameters:
      br - the BufferedReader to read data from
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readFastaRNA

      Deprecated.
      Iterate over the sequences in an FASTA-format stream of RNA sequences.
      Parameters:
      br - the BufferedReader to read data from
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readFastaProtein

      Deprecated.
      Iterate over the sequences in an FASTA-format stream of Protein sequences.
      Parameters:
      br - the BufferedReader to read data from
      Returns:
      a SequenceIterator that iterates over each Sequence in the file
    • readFasta

      public static SequenceDB readFasta(InputStream seqFile, Alphabet alpha) throws BioException
      Deprecated.
      Create a sequence database from a fasta file provided as an input stream. Note this somewhat duplicates functionality in the readFastaDNA and readFastaProtein methods but uses a stream rather than a reader and returns a SequenceDB rather than a SequenceIterator. If the returned DB is likely to be large then the above mentioned methods should be used.
      Parameters:
      seqFile - The file containg the fasta formatted sequences
      alpha - The Alphabet of the sequence, ie DNA, RNA etc
      Returns:
      a SequenceDB containing all the Sequences in the file.
      Throws:
      BioException - if problems occur during reading of the stream.
      Since:
      1.2
    • writeFasta

      public static void writeFasta(OutputStream os, SequenceDB db) throws IOException
      Deprecated.
      Write a sequenceDB to an output stream in fasta format.
      Parameters:
      os - the stream to write the fasta formatted data to.
      db - the database of Sequences to write
      Throws:
      IOException - if there was an error while writing.
      Since:
      1.2
    • writeFasta

      public static void writeFasta(OutputStream os, SequenceIterator in) throws IOException
      Deprecated.
      Writes sequences from a SequenceIterator to an OutputStream in Fasta Format. This makes for a useful format filter where a StreamReader can be sent to the StreamWriter after formatting.
      Parameters:
      os - The stream to write fasta formatted data to
      in - The source of input Sequences
      Throws:
      IOException - if there was an error while writing.
      Since:
      1.2
    • writeFasta

      public static void writeFasta(OutputStream os, Sequence seq) throws IOException
      Deprecated.
      Writes a single Sequence to an OutputStream in Fasta format.
      Parameters:
      os - the OutputStream.
      seq - the Sequence.
      Throws:
      IOException - if there was an error while writing.
    • writeEmbl

      public static void writeEmbl(OutputStream os, SequenceIterator in) throws IOException
      Deprecated.
      Writes a stream of Sequences to an OutputStream in EMBL format.
      Parameters:
      os - the OutputStream.
      in - a SequenceIterator.
      Throws:
      IOException - if there was an error while writing.
    • writeEmbl

      public static void writeEmbl(OutputStream os, Sequence seq) throws IOException
      Deprecated.
      Writes a single Sequence to an OutputStream in EMBL format.
      Parameters:
      os - the OutputStream.
      seq - the Sequence.
      Throws:
      IOException - if there was an error while writing.
    • writeSwissprot

      Deprecated.
      Writes a stream of Sequences to an OutputStream in SwissProt format.
      Parameters:
      os - the OutputStream.
      in - a SequenceIterator.
      Throws:
      BioException - if the Sequence cannot be converted to SwissProt format
      IOException - if there was an error while writing.
    • writeSwissprot

      public static void writeSwissprot(OutputStream os, Sequence seq) throws IOException, BioException
      Deprecated.
      Writes a single Sequence to an OutputStream in SwissProt format.
      Parameters:
      os - the OutputStream.
      seq - the Sequence.
      Throws:
      BioException - if the Sequence cannot be written to SwissProt format
      IOException - if there was an error while writing.
    • writeGenpept

      public static void writeGenpept(OutputStream os, SequenceIterator in) throws IOException, BioException
      Deprecated.
      Writes a stream of Sequences to an OutputStream in Genpept format.
      Parameters:
      os - the OutputStream.
      in - a SequenceIterator.
      Throws:
      BioException - if the Sequence cannot be written to Genpept format
      IOException - if there was an error while writing.
    • writeGenpept

      public static void writeGenpept(OutputStream os, Sequence seq) throws IOException, BioException
      Deprecated.
      Writes a single Sequence to an OutputStream in Genpept format.
      Parameters:
      os - the OutputStream.
      seq - the Sequence.
      Throws:
      BioException - if the Sequence cannot be written to Genpept format
      IOException - if there was an error while writing.
    • writeGenbank

      public static void writeGenbank(OutputStream os, SequenceIterator in) throws IOException
      Deprecated.
      Writes a stream of Sequences to an OutputStream in Genbank format.
      Parameters:
      os - the OutputStream.
      in - a SequenceIterator.
      Throws:
      IOException - if there was an error while writing.
    • writeGenbank

      public static void writeGenbank(OutputStream os, Sequence seq) throws IOException
      Deprecated.
      Writes a single Sequence to an OutputStream in Genbank format.
      Parameters:
      os - the OutputStream.
      seq - the Sequence.
      Throws:
      IOException - if there was an error while writing.
    • identifyFormat

      public static int identifyFormat(String formatName, String alphabetName)
      Deprecated.
      identifyFormat performs a case-insensitive mapping of a pair of common sequence format name (such as 'embl', 'genbank' or 'fasta') and alphabet name (such as 'dna', 'rna', 'protein', 'aa') to an integer. The value returned will be one of the public static final fields in SeqIOConstants, or a bitwise-or combination of them. The method will reject known illegal combinations of format and alphabet (such as swissprot + dna) by throwing an IllegalArgumentException. It will return the SeqIOConstants.UNKNOWN value when either format or alphabet are unknown.
      Parameters:
      formatName - a String.
      alphabetName - a String.
      Returns:
      an int.
    • getSequenceFormat

      public static SequenceFormat getSequenceFormat(int identifier) throws BioException
      Deprecated.
      getSequenceFormat accepts a value which represents a sequence format and returns the relevant SequenceFormat object.
      Parameters:
      identifier - an int which represents a binary value with bits set according to the scheme described in SeqIOConstants.
      Returns:
      a SequenceFormat.
      Throws:
      BioException - if an error occurs.
    • getBuilderFactory

      public static SequenceBuilderFactory getBuilderFactory(int identifier) throws BioException
      Deprecated.
      getBuilderFactory accepts a value which represents a sequence format and returns the relevant SequenceBuilderFactory object.
      Parameters:
      identifier - an int which represents a binary value with bits set according to the scheme described in SeqIOConstants.
      Returns:
      a SequenceBuilderFactory.
      Throws:
      BioException - if an error occurs.
    • getAlphabet

      public static FiniteAlphabet getAlphabet(int identifier) throws BioException
      Deprecated.
      getAlphabet accepts a value which represents a sequence format and returns the relevant FiniteAlphabet object.
      Parameters:
      identifier - an int which represents a binary value with bits set according to the scheme described in SeqIOConstants.
      Returns:
      a FiniteAlphabet.
      Throws:
      BioException - if an error occurs.
    • guessFileType

      public static int guessFileType(File seqFile) throws IOException, FileNotFoundException
      Deprecated.
      because there is no standard file naming convention and guessing by file name is inherantly error prone and bad.
      Attempts to guess the filetype of a file given the name. For use with the functions below that take an int fileType as a parameter. EMBL and Genbank files are assumed to contain DNA sequence.
      Parameters:
      seqFile - the File to read from.
      Returns:
      a value that describes the file type.
      Throws:
      IOException - if seqFile cannot be read
      FileNotFoundException - if seqFile cannot be found
    • formatToFactory

      Deprecated.
      as this essentially duplicates the operation available in the method identifyBuilderFactory.
      Attempts to retrieve the most appropriate SequenceBuilder object for some combination of Alphabet and SequenceFormat
      Parameters:
      format - currently supports FastaFormat, GenbankFormat, EmblLikeFormat
      alpha - currently only supports the DNA and Protein alphabets
      Returns:
      the SequenceBuilderFactory
      Throws:
      BioException - if the combination of alpha and format is unrecognized.
    • fileToBiojava

      public static Object fileToBiojava(String formatName, String alphabetName, BufferedReader br) throws BioException
      Deprecated.
      Reads a file with the specified format and alphabet
      Parameters:
      formatName - the name of the format eg genbank or swissprot (case insensitive)
      alphabetName - the name of the alphabet eg dna or rna or protein (case insensitive)
      br - a BufferedReader for the input
      Returns:
      either an Alignment object or a SequenceIterator (depending on the format read)
      Throws:
      BioException - if an error occurs while reading or a unrecognized format, alphabet combination is used (eg swissprot and DNA).
      Since:
      1.3
    • fileToBiojava

      public static Object fileToBiojava(int fileType, BufferedReader br) throws BioException
      Deprecated.
      Reads a file and returns the corresponding Biojava object. You need to cast it as an Alignment or a SequenceIterator as appropriate.
      Parameters:
      fileType - a value that describes the file type
      br - the reader for the input
      Returns:
      either a SequenceIterator if the file type is a sequence file, or a Alignment if the file is a sequence alignment.
      Throws:
      BioException - if the file cannot be parsed
    • biojavaToFile

      public static void biojavaToFile(String formatName, String alphabetName, OutputStream os, Object biojava) throws BioException, IOException, IllegalSymbolException
      Deprecated.
      Writes a Biojava SequenceIterator, SequenceDB, Sequence or Aligment to an OutputStream
      Parameters:
      formatName - eg fasta, GenBank (case insensitive)
      alphabetName - eg DNA, RNA (case insensititve)
      os - where to write to
      biojava - the object to write
      Throws:
      BioException - problems getting data from the biojava object.
      IOException - if there are IO problems
      IllegalSymbolException - a Symbol cannot be parsed
    • biojavaToFile

      public static void biojavaToFile(int fileType, OutputStream os, Object biojava) throws BioException, IOException, IllegalSymbolException
      Deprecated.
      Converts a Biojava object to the given filetype.
      Parameters:
      fileType - a value that describes the type of sequence file
      os - the stream to write the formatted results to
      biojava - a SequenceIterator, SequenceDB, Sequence, or Alignment
      Throws:
      BioException - if biojava cannot be converted to that format.
      IOException - if the output cannot be written to os
      IllegalSymbolException - if biojava contains a Symbol that cannot be understood by the parser.