Package org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
java.lang.Object
org.apache.poi.extractor.POITextExtractor
org.apache.poi.ooxml.extractor.POIXMLTextExtractor
org.apache.poi.xwpf.extractor.XWPFWordExtractor
- All Implemented Interfaces:
Closeable
,AutoCloseable
Helper class to extract text from an OOXML Word file
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionXWPFWordExtractor
(OPCPackage container) XWPFWordExtractor
(XWPFDocument document) -
Method Summary
Modifier and TypeMethodDescriptionvoid
void
appendParagraphText
(StringBuilder text, XWPFParagraph paragraph) getText()
Retrieves all the text from the document.static void
void
setConcatenatePhoneticRuns
(boolean concatenatePhoneticRuns) Should we concatenate phonetic runs in extraction.void
setFetchHyperlinks
(boolean fetch) Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contentsMethods inherited from class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
checkMaxTextSize, close, getCoreProperties, getCustomProperties, getDocument, getExtendedProperties, getMetadataTextExtractor, getPackage
Methods inherited from class org.apache.poi.extractor.POITextExtractor
setFilesystem
-
Field Details
-
SUPPORTED_TYPES
-
-
Constructor Details
-
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container) throws org.apache.xmlbeans.XmlException, OpenXML4JException, IOException - Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
IOException
-
XWPFWordExtractor
-
-
Method Details
-
main
- Throws:
Exception
-
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch) Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents -
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) Should we concatenate phonetic runs in extraction. Default istrue
- Parameters:
concatenatePhoneticRuns
-
-
getText
Description copied from class:POITextExtractor
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getText
in classPOITextExtractor
- Returns:
- All the text from the document
-
appendBodyElementText
-
appendParagraphText
-