Class ExtractText


  • public final class ExtractText
    extends java.lang.Object
    This is the main program that simply parses the pdf document and transforms it into text.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static java.lang.String ALWAYSNEXT  
      private static java.lang.String CONSOLE  
      private static java.lang.String DEBUG  
      private boolean debugOutput  
      private static java.lang.String ENCODING  
      private static java.lang.String END_PAGE  
      private static java.lang.String HTML  
      private static java.lang.String IGNORE_BEADS  
      private static org.apache.commons.logging.Log LOG  
      private static java.lang.String PASSWORD  
      private static java.lang.String ROTATION_MAGIC  
      private static java.lang.String SORT  
      private static java.lang.String START_PAGE  
      private static java.lang.String STD_ENCODING  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private ExtractText()
      private constructor.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void extractPages​(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext)  
      (package private) static int getAngle​(TextPosition text)  
      static void main​(java.lang.String[] args)
      Infamous main method.
      void startExtraction​(java.lang.String[] args)
      Starts the text extraction.
      private long startProcessing​(java.lang.String message)  
      private void stopProcessing​(java.lang.String message, long startTime)  
      private static void usage()
      This will print the usage requirements and exit.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ExtractText

        private ExtractText()
        private constructor.
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException
        Infamous main method.
        Parameters:
        args - Command line arguments, should be one and a reference to a file.
        Throws:
        java.io.IOException - if there is an error reading the document or extracting the text.
      • startExtraction

        public void startExtraction​(java.lang.String[] args)
                             throws java.io.IOException
        Starts the text extraction.
        Parameters:
        args - the commandline arguments.
        Throws:
        java.io.IOException - if there is an error reading the document or extracting the text.
      • extractPages

        private void extractPages​(int startPage,
                                  int endPage,
                                  PDFTextStripper stripper,
                                  PDDocument document,
                                  java.io.Writer output,
                                  boolean rotationMagic,
                                  boolean alwaysNext)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • startProcessing

        private long startProcessing​(java.lang.String message)
      • stopProcessing

        private void stopProcessing​(java.lang.String message,
                                    long startTime)
      • usage

        private static void usage()
        This will print the usage requirements and exit.