Class BlastXMLParser

java.lang.Object
org.biojava.utils.stax.StAXContentHandlerBase
org.biojava.bio.program.sax.blastxml.BlastXMLParser
All Implemented Interfaces:
StAXContentHandler

This class parses NCBI Blast XML output.

It has two modes:- i) single output document mode: this takes a document containing a single BlastOutput element and parses it. This is generated when a single query is searched against a sequence database.

ii) multiple query document mode: unfortunately, NCBI BLAST concatenates the results of multiple searches in one file. This leads to an ill-formed document that violates every XML format known to the human race and other nearby civilisations. This parser will take a bowdlerised version of this output that is wrapped in a blast_aggregate element.

The massaged form is generated by stripping the XML element and DOCTYPE elements and wrapping all the classes in a single blast_aggregate element. In Linux, this can be done with:-

 #!/bin/sh
 # Converts a Blast XML output to something vaguely well-formed
 # for parsing.
 # Use: blast_aggregate  

 # strips all <?xml> and <!DOCTYPE> tags
 # encapsulates the multiple <BlastOutput> elements into <blast_aggregator>

 sed '/>?xml/d' $1 | sed '/<!DOCTYPE/d' | sed '1i\
 <blast_aggregate>
 $a\
 </blast_aggregate>' > $2
Author:
David Huen
  • Field Details

    • staxenv

      public org.biojava.bio.program.sax.blastxml.StAXFeatureHandler staxenv
      Nesting class that provides callback interfaces to nested class
  • Constructor Details

  • Method Details