public class PreflightParser extends NonSequentialPDFParser
Modifier and Type | Field and Description |
---|---|
protected PreflightContext |
ctx |
static Charset |
encoding
Define a one byte encoding that hasn't specific encoding in UTF-8 charset.
|
protected DataSource |
originalDocument |
protected PreflightDocument |
preflightDocument |
protected ValidationResult |
validationResult |
DEFAULT_TRAIL_BYTECOUNT, EOF_MARKER, OBJ_MARKER, securityHandler, STARTXREF_MARKER, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX
isFDFDocment, xrefTrailerResolver
DEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE
Constructor and Description |
---|
PreflightParser(DataSource input) |
PreflightParser(File file) |
PreflightParser(File file,
RandomAccess rafi) |
PreflightParser(String filename) |
Modifier and Type | Method and Description |
---|---|
protected void |
addValidationError(ValidationResult.ValidationError error)
Add the error to the ValidationResult.
|
protected void |
addValidationErrors(List<ValidationResult.ValidationError> errors) |
protected void |
checkEndstreamKeyWord()
'endstream' must be preceded by an EOL
|
protected void |
checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification.
|
protected void |
checkStreamKeyWord()
'stream' must be followed by
|
protected void |
createContext()
Create a validation context.
|
protected void |
createPdfADocument(Format format,
PreflightConfiguration config) |
protected static ValidationResult |
createUnknownErrorResult()
Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)
|
PDDocument |
getPDDocument()
This will get the PD document that was parsed.
|
PreflightDocument |
getPreflightDocument() |
protected void |
initialParse()
The initial parse will first parse only the trailer, the xrefstart and
all xref tables to have a pointer (offset) to all the pdf's objects.
|
protected int |
lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
Searches last appearance of pattern within buffer.
|
protected boolean |
nextIsEOL() |
protected boolean |
nextIsSpace() |
void |
parse()
This will parse the stream and populate the COSDocument object.
|
void |
parse(Format format)
Parse the given file and check if it is a confirming file according to the given format.
|
void |
parse(Format format,
PreflightConfiguration config)
Parse the given file and check if it is a confirming file according to the given format.
|
protected COSArray |
parseCOSArray()
This will parse a PDF array object.
|
protected COSName |
parseCOSName()
This will parse a PDF name from the stream.
|
protected COSStream |
parseCOSStream(COSDictionary dic,
RandomAccess file)
Wraps the
NonSequentialPDFParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary, org.apache.pdfbox.io.RandomAccess) to check rules on 'stream' and 'endstream' keywords. |
protected COSString |
parseCOSString()
Check that the hexa string contains only an even number of Hexadecimal characters.
|
protected COSString |
parseCOSString(boolean isDictionary)
Deprecated.
Not needed anymore. Use
#COSString() instead. PDFBOX-1437 |
protected COSBase |
parseDirObject()
Call
BaseParser.parseDirObject() check limit range for Float, Integer and number of Dictionary entries. |
protected COSBase |
parseObjectDynamically(int objNr,
int objGenNr,
boolean requireExistingNotCompressedObj)
This will parse the next object from the stream and add it to the local
state.
|
protected boolean |
parseXrefTable(long startByteOffset)
Same method than the PDFParser.parseXrefTable(long) with additional controls : - EOL mandatory after
the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on
|
decrypt, decryptDictionary, decryptString, deleteTempFile, getPage, getPageNumber, getPdfFile, getSecurityHandler, getStartxrefOffset, isLenient, parseObjectDynamically, readPattern, releasePdfSourceInputStream, setEOFLookupRange, setLenient, setPdfSource
clearResources, getDocument, getFDFDocument, isContinueOnError, parseHeader, parseStartXref, parseTrailer, parseXrefStream, parseXrefStream, readVersionInTrailer, setTempDirectory
isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSDictionary, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, readUntilEndStream, setDocument, skipSpaces
public static final Charset encoding
protected DataSource originalDocument
protected ValidationResult validationResult
protected PreflightDocument preflightDocument
protected PreflightContext ctx
public PreflightParser(File file, RandomAccess rafi) throws IOException
IOException
public PreflightParser(File file) throws IOException
IOException
public PreflightParser(String filename) throws IOException
IOException
public PreflightParser(DataSource input) throws IOException
IOException
protected static ValidationResult createUnknownErrorResult()
protected void addValidationError(ValidationResult.ValidationError error)
error
- protected void addValidationErrors(List<ValidationResult.ValidationError> errors)
public void parse() throws IOException
NonSequentialPDFParser
parse
in class NonSequentialPDFParser
IOException
- If there is an error reading from the stream or corrupt data
is found.public void parse(Format format) throws IOException
format
- format that the document should follow (default Format.PDF_A1B
)IOException
public void parse(Format format, PreflightConfiguration config) throws IOException
format
- format that the document should follow (default Format.PDF_A1B
)config
- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine
the default configuration.IOException
protected void createPdfADocument(Format format, PreflightConfiguration config) throws IOException
IOException
protected void createContext()
public PDDocument getPDDocument() throws IOException
NonSequentialPDFParser
getPDDocument
in class NonSequentialPDFParser
IOException
- If there is an error getting the document.public PreflightDocument getPreflightDocument() throws IOException
IOException
protected void initialParse() throws IOException
NonSequentialPDFParser
initialParse
in class NonSequentialPDFParser
IOException
- If something went wrong.protected void checkPdfHeader()
protected boolean parseXrefTable(long startByteOffset) throws IOException
parseXrefTable
in class PDFParser
startByteOffset
- the offset to start atIOException
- If an IO error occurs.protected COSStream parseCOSStream(COSDictionary dic, RandomAccess file) throws IOException
NonSequentialPDFParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary, org.apache.pdfbox.io.RandomAccess)
to check rules on 'stream' and 'endstream' keywords.
checkStreamKeyWord()
and checkEndstreamKeyWord()
parseCOSStream
in class NonSequentialPDFParser
dic
- dictionary that goes with this stream.file
- file to write the stream to when reading.IOException
- if an error occurred reading the stream, like
problems with reading length attribute, stream does not end
with 'endstream' after data read, stream too short etc.protected void checkStreamKeyWord() throws IOException
IOException
protected void checkEndstreamKeyWord() throws IOException
IOException
protected boolean nextIsEOL() throws IOException
IOException
protected boolean nextIsSpace() throws IOException
IOException
protected COSArray parseCOSArray() throws IOException
BaseParser
parseCOSArray
in class BaseParser
IOException
- If there is an error parsing the stream.protected COSName parseCOSName() throws IOException
BaseParser
parseCOSName
in class BaseParser
IOException
- If there is an error reading from the stream.@Deprecated protected COSString parseCOSString(boolean isDictionary) throws IOException
#COSString()
instead. PDFBOX-1437BaseParser.parseCOSString()
parseCOSString
in class BaseParser
isDictionary
- indicates if the stream is a dictionary or notIOException
- If there is an error reading from the stream.protected COSString parseCOSString() throws IOException
BaseParser.parseCOSString()
parseCOSString
in class BaseParser
IOException
- If there is an error reading from the stream.protected COSBase parseDirObject() throws IOException
BaseParser.parseDirObject()
check limit range for Float, Integer and number of Dictionary entries.parseDirObject
in class BaseParser
IOException
- If there is an error during parsing.protected COSBase parseObjectDynamically(int objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
NonSequentialPDFParser
PDFParser
and reduced to parsing an
indirect object.parseObjectDynamically
in class NonSequentialPDFParser
objNr
- object number of object to be parsedobjGenNr
- object generation number of object to be parsedrequireExistingNotCompressedObj
- if true
the object to
be parsed must be defined in xref (comment: null objects may
be missing from xref) and it must not be a compressed object
within object stream (this is used to circumvent being stuck
in a loop in a malicious PDF)IOException
- If an IO error occurs.protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)
NonSequentialPDFParser
lastIndexOf
in class NonSequentialPDFParser
pattern
- pattern to search forbuf
- buffer to search pattern inendOff
- offset (exclusive) where lookup starts at-1
if
pattern could not be foundCopyright © 2002–2015 The Apache Software Foundation. All rights reserved.