Class ChunkedSymbolListFactory

java.lang.Object
org.biojava.bio.seq.io.ChunkedSymbolListFactory

public class ChunkedSymbolListFactory extends Object
class that makes ChunkedSymbolLists with the chunks implemented as SymbolLists themselves.

The advantage is that those SymbolLists can be packed implementations.

You can build a SequenceBuilderFactory to create a packed chunked sequence from an input file without making an intermediate symbol list with:-

 public class PackedChunkedListFactory implements SequenceBuilderFactory
 {
   public SequenceBuilder makeSequenceBuilder()
   {
     return new SequenceBuilderBase() {
       private ChunkedSymbolListFactory chunker = new ChunkedSymbolListFactory(new PackedSymbolListFactory(true));

       // deal with symbols
       public void addSymbols(Alphabet alpha, Symbol[] syms, int pos, int len)
         throws IllegalAlphabetException
       {
         chunker.addSymbols(alpha, syms, pos, len);
       }

       // make the sequence
       public Sequence makeSequence()
       {
         try {
           // make the SymbolList
           SymbolList symbols = chunker.makeSymbolList();
           seq = new SimpleSequence(symbols, uri, name, annotation);

           // call superclass method
           return super.makeSequence();
         }
         catch (IllegalAlphabetException iae) {
           throw new BioError("couldn't create symbol list");
         }
       }
     };
   }
 }
 

Then reading in FASTA files can be done with something like:-

 SequenceIterator seqI = new StreamReader(br, new FastaFormat(),
     DNATools.getDNA().getTokenization("token"),
     new PackedChunkedListFactory() );
 

Blend to suit taste.

Alternatively, you can input Symbols to the factory with addSymbols make the sequence eventually with makeSymbolList.

NOTE: An improvement has been introduced where an internal default SymbolList factory is used for small sequences. This implementation allows for faster SymbolList creation and access for small sequences while allowing a more space-efficient implementation to be selected for large sequences.

NOTE: This class is inherantly not threadsafe. You should create one instance for each symbol list you wish to manufacture, and then you should throw that instance away.

Author:
David Huen