Class POIOLE2TextExtractor

java.lang.Object
org.apache.poi.extractor.POITextExtractor
org.apache.poi.extractor.POIOLE2TextExtractor
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
EventBasedExcelExtractor, ExcelExtractor, HPSFPropertiesExtractor, OutlookTextExtactor, PowerPointExtractor, PublisherTextExtractor, VisioTextExtractor, Word6Extractor, WordExtractor

public abstract class POIOLE2TextExtractor extends POITextExtractor
Common Parent for OLE2 based Text Extractors of POI Documents, such as .doc, .xls You will typically find the implementation of a given format's text extractor under org.apache.poi.[format].extractor .
See Also:
  • Field Details

    • document

      protected POIDocument document
      The POIDocument that's open
  • Constructor Details

    • POIOLE2TextExtractor

      public POIOLE2TextExtractor(POIDocument document)
      Creates a new text extractor for the given document
      Parameters:
      document - The POIDocument to use in this extractor.
    • POIOLE2TextExtractor

      protected POIOLE2TextExtractor(POIOLE2TextExtractor otherExtractor)
      Creates a new text extractor, using the same document as another text extractor. Normally only used by properties extractors.
      Parameters:
      otherExtractor - the extractor which document to be used
  • Method Details

    • getDocSummaryInformation

      public DocumentSummaryInformation getDocSummaryInformation()
      Returns the document information metadata for the document
      Returns:
      The Document Summary Information or null if it could not be read for this document.
    • getSummaryInformation

      public SummaryInformation getSummaryInformation()
      Returns the summary information metadata for the document.
      Returns:
      The Summary information for the document or null if it could not be read for this document.
    • getMetadataTextExtractor

      public POITextExtractor getMetadataTextExtractor()
      Returns an HPSF powered text extractor for the document properties metadata, such as title and author.
      Specified by:
      getMetadataTextExtractor in class POITextExtractor
      Returns:
      an instance of POIExtractor that can extract meta-data.
    • getRoot

      public DirectoryEntry getRoot()
      Return the underlying DirectoryEntry of this document.
      Returns:
      the DirectoryEntry that is associated with the POIDocument of this extractor.
    • getDocument

      public POIDocument getDocument()
      Return the underlying POIDocument
      Specified by:
      getDocument in class POITextExtractor
      Returns:
      the underlying POIDocument