Class PreflightParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.COSParser
-
- org.apache.pdfbox.pdfparser.PDFParser
-
- org.apache.pdfbox.preflight.parser.PreflightParser
-
public class PreflightParser extends PDFParser
-
-
Field Summary
Fields Modifier and Type Field Description protected PreflightContext
ctx
protected javax.activation.DataSource
dataSource
static java.nio.charset.Charset
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset.protected PreflightDocument
preflightDocument
protected ValidationResult
validationResult
-
Fields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, source, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX, xrefTrailerResolver
-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, STREAM_STRING, T
-
-
Constructor Summary
Constructors Constructor Description PreflightParser(java.io.File file)
Constructor.PreflightParser(java.io.File file, ScratchFile scratch)
Constructor.PreflightParser(java.lang.String filename)
Constructor.PreflightParser(java.lang.String filename, ScratchFile scratch)
Constructor.PreflightParser(javax.activation.DataSource dataSource)
Constructor.PreflightParser(javax.activation.DataSource dataSource, ScratchFile scratch)
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
addValidationError(ValidationResult.ValidationError error)
Add the error to the ValidationResult.protected void
addValidationErrors(java.util.List<ValidationResult.ValidationError> errors)
protected void
checkEndstreamKeyWord()
'endstream' must be preceded by an EOLprotected void
checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification.protected void
checkStreamKeyWord()
'stream' must be followed by <CR><LF> or only <LF>protected void
createContext()
Create a validation context.protected void
createPdfADocument(Format format, PreflightConfiguration config)
protected static ValidationResult
createUnknownErrorResult()
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)PDDocument
getPDDocument()
This will get the PD document that was parsed.PreflightDocument
getPreflightDocument()
protected void
initialParse()
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.protected int
lastIndexOf(char[] pattern, byte[] buf, int endOff)
Searches last appearance of pattern within buffer.private boolean
nextIsEOL()
void
parse()
This will parse the stream and populate the COSDocument object.void
parse(Format format)
Parse the given file and check if it is a confirming file according to the given format.void
parse(Format format, PreflightConfiguration config)
Parse the given file and check if it is a confirming file according to the given format.protected COSArray
parseCOSArray()
This will parse a PDF array object.protected COSName
parseCOSName()
This will parse a PDF name from the stream.protected COSStream
parseCOSStream(COSDictionary dic)
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)
to check rules on 'stream' and 'endstream' keywords.protected COSString
parseCOSString()
Check that the hexa string contains only an even number of Hexadecimal characters.protected COSBase
parseDirObject()
CallBaseParser.parseDirObject()
check limit range for Float, Integer and number of Dictionary entries.protected COSBase
parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj)
This will parse the next object from the stream and add it to the local state.protected boolean
parseXrefTable(long startByteOffset)
Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on-
Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, getAccessPermission, getDocument, getEncryption, getStartxrefOffset, isCatalog, isLenient, parseDictObjects, parseFDFHeader, parseObjectDynamically, parsePDFHeader, parseTrailerValuesDynamically, parseXref, rebuildTrailer, retrieveTrailer, setEOFLookupRange, setLenient
-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
-
-
-
-
Field Detail
-
encoding
public static final java.nio.charset.Charset encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816
-
dataSource
protected javax.activation.DataSource dataSource
-
validationResult
protected ValidationResult validationResult
-
preflightDocument
protected PreflightDocument preflightDocument
-
ctx
protected PreflightContext ctx
-
-
Constructor Detail
-
PreflightParser
public PreflightParser(java.io.File file) throws java.io.IOException
Constructor.- Parameters:
file
-- Throws:
java.io.IOException
- if there is a reading error.
-
PreflightParser
public PreflightParser(java.io.File file, ScratchFile scratch) throws java.io.IOException
Constructor.- Parameters:
file
-scratch
-- Throws:
java.io.IOException
- if there is a reading error.
-
PreflightParser
public PreflightParser(java.lang.String filename) throws java.io.IOException
Constructor.- Parameters:
filename
-- Throws:
java.io.IOException
- if there is a reading error.
-
PreflightParser
public PreflightParser(java.lang.String filename, ScratchFile scratch) throws java.io.IOException
Constructor.- Parameters:
filename
-scratch
-- Throws:
java.io.IOException
- if there is a reading error.
-
PreflightParser
public PreflightParser(javax.activation.DataSource dataSource) throws java.io.IOException
Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.- Parameters:
dataSource
- the datasource- Throws:
java.io.IOException
- if there is a reading error.
-
PreflightParser
public PreflightParser(javax.activation.DataSource dataSource, ScratchFile scratch) throws java.io.IOException
Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.- Parameters:
dataSource
- the datasourcescratch
-- Throws:
java.io.IOException
- if there is a reading error.
-
-
Method Detail
-
createUnknownErrorResult
protected static ValidationResult createUnknownErrorResult()
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)- Returns:
- the ValidationError instance.
-
addValidationError
protected void addValidationError(ValidationResult.ValidationError error)
Add the error to the ValidationResult. If the validationResult is null, an instance is created using the isWarning boolean of the ValidationError to know if the ValidationResult must be flagged as Valid.- Parameters:
error
-
-
addValidationErrors
protected void addValidationErrors(java.util.List<ValidationResult.ValidationError> errors)
-
parse
public void parse() throws java.io.IOException
Description copied from class:PDFParser
This will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.- Overrides:
parse
in classPDFParser
- Throws:
InvalidPasswordException
- If the password is incorrect.java.io.IOException
- If there is an error reading from the stream or corrupt data is found.
-
parse
public void parse(Format format) throws java.io.IOException
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format
- format that the document should follow (defaultFormat.PDF_A1B
)- Throws:
java.io.IOException
-
parse
public void parse(Format format, PreflightConfiguration config) throws java.io.IOException
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format
- format that the document should follow (defaultFormat.PDF_A1B
)config
- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.- Throws:
java.io.IOException
-
createPdfADocument
protected void createPdfADocument(Format format, PreflightConfiguration config) throws java.io.IOException
- Throws:
java.io.IOException
-
createContext
protected void createContext()
Create a validation context. This context is set to the PreflightDocument.
-
getPDDocument
public PDDocument getPDDocument() throws java.io.IOException
Description copied from class:PDFParser
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Overrides:
getPDDocument
in classPDFParser
- Returns:
- The document at the PD layer.
- Throws:
java.io.IOException
- If there is an error getting the document.
-
getPreflightDocument
public PreflightDocument getPreflightDocument() throws java.io.IOException
- Throws:
java.io.IOException
-
initialParse
protected void initialParse() throws java.io.IOException
Description copied from class:PDFParser
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Overrides:
initialParse
in classPDFParser
- Throws:
InvalidPasswordException
- If the password is incorrect.java.io.IOException
- If something went wrong.
-
checkPdfHeader
protected void checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80
-
parseXrefTable
protected boolean parseXrefTable(long startByteOffset) throws java.io.IOException
Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on- Overrides:
parseXrefTable
in classCOSParser
- Parameters:
startByteOffset
- the offset to start at- Returns:
- false on parsing error
- Throws:
java.io.IOException
- If an IO error occurs.
-
parseCOSStream
protected COSStream parseCOSStream(COSDictionary dic) throws java.io.IOException
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)
to check rules on 'stream' and 'endstream' keywords.checkStreamKeyWord()
andcheckEndstreamKeyWord()
- Overrides:
parseCOSStream
in classCOSParser
- Parameters:
dic
- dictionary that goes with this stream.- Returns:
- parsed pdf stream.
- Throws:
java.io.IOException
- if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
-
checkStreamKeyWord
protected void checkStreamKeyWord() throws java.io.IOException
'stream' must be followed by <CR><LF> or only <LF>- Throws:
java.io.IOException
-
checkEndstreamKeyWord
protected void checkEndstreamKeyWord() throws java.io.IOException
'endstream' must be preceded by an EOL- Throws:
java.io.IOException
-
nextIsEOL
private boolean nextIsEOL() throws java.io.IOException
- Throws:
java.io.IOException
-
parseCOSArray
protected COSArray parseCOSArray() throws java.io.IOException
Description copied from class:BaseParser
This will parse a PDF array object.- Overrides:
parseCOSArray
in classBaseParser
- Returns:
- The parsed PDF array.
- Throws:
java.io.IOException
- If there is an error parsing the stream.
-
parseCOSName
protected COSName parseCOSName() throws java.io.IOException
Description copied from class:BaseParser
This will parse a PDF name from the stream.- Overrides:
parseCOSName
in classBaseParser
- Returns:
- The parsed PDF name.
- Throws:
java.io.IOException
- If there is an error reading from the stream.
-
parseCOSString
protected COSString parseCOSString() throws java.io.IOException
Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and callBaseParser.parseCOSString()
- Overrides:
parseCOSString
in classBaseParser
- Returns:
- The parsed PDF string.
- Throws:
java.io.IOException
- If there is an error reading from the stream.
-
parseDirObject
protected COSBase parseDirObject() throws java.io.IOException
CallBaseParser.parseDirObject()
check limit range for Float, Integer and number of Dictionary entries.- Overrides:
parseDirObject
in classBaseParser
- Returns:
- The parsed object.
- Throws:
java.io.IOException
- if there is an error during parsing.
-
parseObjectDynamically
protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws java.io.IOException
Description copied from class:COSParser
This will parse the next object from the stream and add it to the local state. It's reduced to parsing an indirect object.- Overrides:
parseObjectDynamically
in classCOSParser
- Parameters:
objNr
- object number of object to be parsedobjGenNr
- object generation number of object to be parsedrequireExistingNotCompressedObj
- iftrue
the object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)- Returns:
- the parsed object (which is also added to document object)
- Throws:
java.io.IOException
- If an IO error occurs.
-
lastIndexOf
protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)
Description copied from class:COSParser
Searches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.- Overrides:
lastIndexOf
in classCOSParser
- Parameters:
pattern
- pattern to search forbuf
- buffer to search pattern inendOff
- offset (exclusive) where lookup starts at- Returns:
- start offset of pattern within buffer or
-1
if pattern could not be found
-
-