Class PreflightParser


  • public class PreflightParser
    extends PDFParser
    • Field Detail

      • encoding

        public static final java.nio.charset.Charset encoding
        Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816
      • dataSource

        protected javax.activation.DataSource dataSource
    • Constructor Detail

      • PreflightParser

        public PreflightParser​(java.io.File file)
                        throws java.io.IOException
        Constructor.
        Parameters:
        file -
        Throws:
        java.io.IOException - if there is a reading error.
      • PreflightParser

        public PreflightParser​(java.io.File file,
                               ScratchFile scratch)
                        throws java.io.IOException
        Constructor.
        Parameters:
        file -
        scratch -
        Throws:
        java.io.IOException - if there is a reading error.
      • PreflightParser

        public PreflightParser​(java.lang.String filename)
                        throws java.io.IOException
        Constructor.
        Parameters:
        filename -
        Throws:
        java.io.IOException - if there is a reading error.
      • PreflightParser

        public PreflightParser​(java.lang.String filename,
                               ScratchFile scratch)
                        throws java.io.IOException
        Constructor.
        Parameters:
        filename -
        scratch -
        Throws:
        java.io.IOException - if there is a reading error.
      • PreflightParser

        public PreflightParser​(javax.activation.DataSource dataSource)
                        throws java.io.IOException
        Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.
        Parameters:
        dataSource - the datasource
        Throws:
        java.io.IOException - if there is a reading error.
      • PreflightParser

        public PreflightParser​(javax.activation.DataSource dataSource,
                               ScratchFile scratch)
                        throws java.io.IOException
        Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.
        Parameters:
        dataSource - the datasource
        scratch -
        Throws:
        java.io.IOException - if there is a reading error.
    • Method Detail

      • createUnknownErrorResult

        protected static ValidationResult createUnknownErrorResult()
        Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)
        Returns:
        the ValidationError instance.
      • addValidationError

        protected void addValidationError​(ValidationResult.ValidationError error)
        Add the error to the ValidationResult. If the validationResult is null, an instance is created using the isWarning boolean of the ValidationError to know if the ValidationResult must be flagged as Valid.
        Parameters:
        error -
      • parse

        public void parse()
                   throws java.io.IOException
        Description copied from class: PDFParser
        This will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.
        Overrides:
        parse in class PDFParser
        Throws:
        InvalidPasswordException - If the password is incorrect.
        java.io.IOException - If there is an error reading from the stream or corrupt data is found.
      • parse

        public void parse​(Format format)
                   throws java.io.IOException
        Parse the given file and check if it is a confirming file according to the given format.
        Parameters:
        format - format that the document should follow (default Format.PDF_A1B)
        Throws:
        java.io.IOException
      • parse

        public void parse​(Format format,
                          PreflightConfiguration config)
                   throws java.io.IOException
        Parse the given file and check if it is a confirming file according to the given format.
        Parameters:
        format - format that the document should follow (default Format.PDF_A1B)
        config - Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.
        Throws:
        java.io.IOException
      • createPdfADocument

        protected void createPdfADocument​(Format format,
                                          PreflightConfiguration config)
                                   throws java.io.IOException
        Throws:
        java.io.IOException
      • createContext

        protected void createContext()
        Create a validation context. This context is set to the PreflightDocument.
      • getPDDocument

        public PDDocument getPDDocument()
                                 throws java.io.IOException
        Description copied from class: PDFParser
        This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
        Overrides:
        getPDDocument in class PDFParser
        Returns:
        The document at the PD layer.
        Throws:
        java.io.IOException - If there is an error getting the document.
      • getPreflightDocument

        public PreflightDocument getPreflightDocument()
                                               throws java.io.IOException
        Throws:
        java.io.IOException
      • initialParse

        protected void initialParse()
                             throws java.io.IOException
        Description copied from class: PDFParser
        The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.
        Overrides:
        initialParse in class PDFParser
        Throws:
        InvalidPasswordException - If the password is incorrect.
        java.io.IOException - If something went wrong.
      • checkPdfHeader

        protected void checkPdfHeader()
        Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80
      • parseXrefTable

        protected boolean parseXrefTable​(long startByteOffset)
                                  throws java.io.IOException
        Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on
        Overrides:
        parseXrefTable in class COSParser
        Parameters:
        startByteOffset - the offset to start at
        Returns:
        false on parsing error
        Throws:
        java.io.IOException - If an IO error occurs.
      • checkStreamKeyWord

        protected void checkStreamKeyWord()
                                   throws java.io.IOException
        'stream' must be followed by <CR><LF> or only <LF>
        Throws:
        java.io.IOException
      • checkEndstreamKeyWord

        protected void checkEndstreamKeyWord()
                                      throws java.io.IOException
        'endstream' must be preceded by an EOL
        Throws:
        java.io.IOException
      • nextIsEOL

        private boolean nextIsEOL()
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • parseCOSArray

        protected COSArray parseCOSArray()
                                  throws java.io.IOException
        Description copied from class: BaseParser
        This will parse a PDF array object.
        Overrides:
        parseCOSArray in class BaseParser
        Returns:
        The parsed PDF array.
        Throws:
        java.io.IOException - If there is an error parsing the stream.
      • parseCOSName

        protected COSName parseCOSName()
                                throws java.io.IOException
        Description copied from class: BaseParser
        This will parse a PDF name from the stream.
        Overrides:
        parseCOSName in class BaseParser
        Returns:
        The parsed PDF name.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • parseCOSString

        protected COSString parseCOSString()
                                    throws java.io.IOException
        Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and call BaseParser.parseCOSString()
        Overrides:
        parseCOSString in class BaseParser
        Returns:
        The parsed PDF string.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • parseDirObject

        protected COSBase parseDirObject()
                                  throws java.io.IOException
        Call BaseParser.parseDirObject() check limit range for Float, Integer and number of Dictionary entries.
        Overrides:
        parseDirObject in class BaseParser
        Returns:
        The parsed object.
        Throws:
        java.io.IOException - if there is an error during parsing.
      • parseObjectDynamically

        protected COSBase parseObjectDynamically​(long objNr,
                                                 int objGenNr,
                                                 boolean requireExistingNotCompressedObj)
                                          throws java.io.IOException
        Description copied from class: COSParser
        This will parse the next object from the stream and add it to the local state. It's reduced to parsing an indirect object.
        Overrides:
        parseObjectDynamically in class COSParser
        Parameters:
        objNr - object number of object to be parsed
        objGenNr - object generation number of object to be parsed
        requireExistingNotCompressedObj - if true the object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)
        Returns:
        the parsed object (which is also added to document object)
        Throws:
        java.io.IOException - If an IO error occurs.
      • lastIndexOf

        protected int lastIndexOf​(char[] pattern,
                                  byte[] buf,
                                  int endOff)
        Description copied from class: COSParser
        Searches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.
        Overrides:
        lastIndexOf in class COSParser
        Parameters:
        pattern - pattern to search for
        buf - buffer to search pattern in
        endOff - offset (exclusive) where lookup starts at
        Returns:
        start offset of pattern within buffer or -1 if pattern could not be found