Package org.apache.pdfbox.tools
Class ExtractText
- java.lang.Object
-
- org.apache.pdfbox.tools.ExtractText
-
public final class ExtractText extends java.lang.Object
This is the main program that simply parses the pdf document and transforms it into text.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.String
ALWAYSNEXT
private static java.lang.String
CONSOLE
private static java.lang.String
DEBUG
private boolean
debugOutput
private static java.lang.String
ENCODING
private static java.lang.String
END_PAGE
private static java.lang.String
HTML
private static java.lang.String
IGNORE_BEADS
private static org.apache.commons.logging.Log
LOG
private static java.lang.String
PASSWORD
private static java.lang.String
ROTATION_MAGIC
private static java.lang.String
SORT
private static java.lang.String
START_PAGE
private static java.lang.String
STD_ENCODING
-
Constructor Summary
Constructors Modifier Constructor Description private
ExtractText()
private constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
extractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext)
(package private) static int
getAngle(TextPosition text)
static void
main(java.lang.String[] args)
Infamous main method.void
startExtraction(java.lang.String[] args)
Starts the text extraction.private long
startProcessing(java.lang.String message)
private void
stopProcessing(java.lang.String message, long startTime)
private static void
usage()
This will print the usage requirements and exit.
-
-
-
Field Detail
-
LOG
private static final org.apache.commons.logging.Log LOG
-
PASSWORD
private static final java.lang.String PASSWORD
- See Also:
- Constant Field Values
-
ENCODING
private static final java.lang.String ENCODING
- See Also:
- Constant Field Values
-
CONSOLE
private static final java.lang.String CONSOLE
- See Also:
- Constant Field Values
-
START_PAGE
private static final java.lang.String START_PAGE
- See Also:
- Constant Field Values
-
END_PAGE
private static final java.lang.String END_PAGE
- See Also:
- Constant Field Values
-
SORT
private static final java.lang.String SORT
- See Also:
- Constant Field Values
-
IGNORE_BEADS
private static final java.lang.String IGNORE_BEADS
- See Also:
- Constant Field Values
-
DEBUG
private static final java.lang.String DEBUG
- See Also:
- Constant Field Values
-
HTML
private static final java.lang.String HTML
- See Also:
- Constant Field Values
-
ALWAYSNEXT
private static final java.lang.String ALWAYSNEXT
- See Also:
- Constant Field Values
-
ROTATION_MAGIC
private static final java.lang.String ROTATION_MAGIC
- See Also:
- Constant Field Values
-
STD_ENCODING
private static final java.lang.String STD_ENCODING
- See Also:
- Constant Field Values
-
debugOutput
private boolean debugOutput
-
-
Method Detail
-
main
public static void main(java.lang.String[] args) throws java.io.IOException
Infamous main method.- Parameters:
args
- Command line arguments, should be one and a reference to a file.- Throws:
java.io.IOException
- if there is an error reading the document or extracting the text.
-
startExtraction
public void startExtraction(java.lang.String[] args) throws java.io.IOException
Starts the text extraction.- Parameters:
args
- the commandline arguments.- Throws:
java.io.IOException
- if there is an error reading the document or extracting the text.
-
extractPages
private void extractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext) throws java.io.IOException
- Throws:
java.io.IOException
-
startProcessing
private long startProcessing(java.lang.String message)
-
stopProcessing
private void stopProcessing(java.lang.String message, long startTime)
-
getAngle
static int getAngle(TextPosition text)
-
usage
private static void usage()
This will print the usage requirements and exit.
-
-