Package org.apache.poi.extractor
Class POITextExtractor
java.lang.Object
org.apache.poi.extractor.POITextExtractor
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
POIOLE2TextExtractor
,POIXMLTextExtractor
,SlideShowExtractor
Common Parent for Text Extractors
of POI Documents.
You will typically find the implementation of
a given format's text extractor under
org.apache.poi.[format].extractor .
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Allows to free resources of the Extractor as soon as it is not needed any more.abstract Object
abstract POITextExtractor
Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.abstract String
getText()
Retrieves all the text from the document.void
Used to ensure file handle cleanup.
-
Constructor Details
-
POITextExtractor
public POITextExtractor()
-
-
Method Details
-
getText
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Returns:
- All the text from the document
-
getMetadataTextExtractor
Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.- Returns:
- the metadata and text extractor
-
setFilesystem
Used to ensure file handle cleanup.- Parameters:
fs
- filesystem to close
-
close
Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
getDocument
- Returns:
- the processed document
-