Class IndexedSequenceDB

All Implemented Interfaces:
Serializable, SequenceDB, SequenceDBLite, Changeable

public final class IndexedSequenceDB extends AbstractSequenceDB implements SequenceDB, Serializable

This class implements SequenceDB on top of a set of sequence files and sequence offsets within these files.

This class is primarily responsible for managing the sequence IO, such as calculating the sequence file offsets, and parsing individual sequences based upon file offsets. The actual persistant storage of all this information is delegated to an instance of IndexStore, such as TabIndexStore.

 // create a new index store and populate it
 // this may take some time
 TabIndexStore indexStore = new TabIndexStore(
   storeFile, indexFile, dbName,
   format, sbFactory, symbolParser );
 IndexedSequenceDB seqDB = new IndexedSequenceDB(indexStore);

 for(int i = 0; i invalid input: '<' files; i++) {
   seqDB.addFile(files[i]);
 }

 // load an existing index store and fetch a sequence
 // this should be quite quick
 TabIndexStore indexStore = TabIndexStore.open(storeFile);
 SequenceDB seqDB = new IndexedSequenceDB(indexStore);
 Sequence seq = seqDB.getSequence(id);
 

Note: We may be able to improve the indexing speed further by discarding all feature creation invalid input: '&' annotation requests during index parsing.

Author:
Matthew Pocock, Thomas Down, Keith James
See Also:
  • Constructor Details

    • IndexedSequenceDB

      public IndexedSequenceDB(IDMaker idMaker, IndexStore indexStore)
      Create an IndexedSequenceDB by specifying both the IDMaker and IndexStore used.

      The IDMaker will be used to calculate the ID for each Sequence. It will delegate the storage and retrieval of the sequence offsets to the IndexStore.

      Parameters:
      idMaker - the IDMaker used to calculate Sequence IDs
      indexStore - the IndexStore delegate
    • IndexedSequenceDB

      public IndexedSequenceDB(IndexStore indexStore)
      Create an IndexedSequenceDB by specifying IndexStore used.

      IDMaker.byName will be used to calculate the ID for each Sequence. It will delegate the storage and retrieval of the sequence offsets to the IndexStore.

      Parameters:
      indexStore - the IndexStore delegate
  • Method Details

    • getIndexStore

      Retrieve the IndexStore.
      Returns:
      the IndexStore delegate
    • addFile

      Add sequences from a file to the sequence database. This method works on an "all or nothing" principle. If it can successfully interpret the entire file, all the sequences will be read in. However, if it encounters any problems, it will abandon the whole file; an IOException will be thrown. Multiple files may be indexed into a single database. A BioException will be thrown if it has problems understanding the sequences.
      Parameters:
      seqFile - the file containing the sequence or set of sequences
      Throws:
      BioException - if for any reason the sequences can't be read correctly
      ChangeVetoException - if there is a listener that vetoes adding the files
      IllegalIDException
    • getName

      public String getName()
      Get the name of this sequence database. The name is retrieved from the IndexStore delegate.
      Specified by:
      getName in interface SequenceDBLite
      Returns:
      the name of the sequence database, which may be null.
    • getSequence

      Description copied from interface: SequenceDBLite
      Retrieve a single sequence by its id.
      Specified by:
      getSequence in interface SequenceDBLite
      Parameters:
      id - the id to retrieve by
      Returns:
      the Sequence with that id
      Throws:
      IllegalIDException - if the database doesn't know about the id
      BioException - if there was a failure in retrieving the sequence
    • sequenceIterator

      Description copied from interface: SequenceDB
      Returns a SequenceIterator over all sequences in the database. The order of retrieval is undefined.
      Specified by:
      sequenceIterator in interface SequenceDB
      Overrides:
      sequenceIterator in class AbstractSequenceDB
      Returns:
      a SequenceIterator over all sequences
    • ids

      public Set ids()
      Description copied from interface: SequenceDB
      Get an immutable set of all of the IDs in the database. The ids are legal arguments to getSequence.
      Specified by:
      ids in interface SequenceDB
      Returns:
      a Set of ids - at the moment, strings