org.apache.poi.hwpf
Class HWPFDocument

java.lang.Object
  extended by org.apache.poi.POIDocument
      extended by org.apache.poi.hwpf.HWPFDocumentCore
          extended by org.apache.poi.hwpf.HWPFDocument

public final class HWPFDocument
extends HWPFDocumentCore

This class acts as the bucket that we throw all of the Word data structures into.

Author:
Ryan Ackley

Field Summary
protected  ComplexFileTable _cft
          Contains text of the document wrapped in a obfuscated Word data structure
protected  CPSplitCalculator _cpSplit
          And for making sense of CP lengths in the FIB
protected  byte[] _dataStream
          data stream buffer
protected  EscherRecordHolder _dgg
          Escher Drawing Group information
protected  DocumentProperties _dop
          Document wide Properties
protected  FSPATable _fspa
          Holds FSBA (shape) information
protected  ShapesTable _officeArts
          Holds Office Art objects
protected  PicturesTable _pictures
          Holds pictures table
protected  RevisionMarkAuthorTable _rmat
          Holds the revision mark authors for this document.
protected  SavedByTable _sbt
          Holds the save history for this document.
protected  byte[] _tableStream
          table stream buffer
protected  TextPieceTable _tpt
           
 
Fields inherited from class org.apache.poi.hwpf.HWPFDocumentCore
_cbt, _fib, _ft, _lt, _mainStream, _pbt, _ss, _st
 
Fields inherited from class org.apache.poi.POIDocument
directory, filesystem
 
Constructor Summary
protected HWPFDocument()
           
  HWPFDocument(DirectoryNode directory, POIFSFileSystem pfilesystem)
          This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default.
  HWPFDocument(java.io.InputStream istream)
          This constructor loads a Word document from an InputStream.
  HWPFDocument(POIFSFileSystem pfilesystem)
          This constructor loads a Word document from a POIFSFileSystem
 
Method Summary
 int characterLength()
          Returns the character length of a document.
 void delete(int start, int length)
           
 Range getCommentsRange()
          Returns the range which covers all the Endnotes.
 CPSplitCalculator getCPSplitCalculator()
           
 byte[] getDataStream()
           
 DocumentProperties getDocProperties()
           
 Range getEndnoteRange()
          Returns the range which covers all the Endnotes.
 Range getFootnoteRange()
          Returns the range which covers all the Footnotes.
 Range getHeaderStoryRange()
          Returns the range which covers all "Header Stories".
 Range getOverallRange()
          Returns the range that covers all text in the file, including main text, footnotes, headers and comments
 PicturesTable getPicturesTable()
           
 Range getRange()
          Returns the range which covers the whole of the document, but excludes any headers and footers.
 RevisionMarkAuthorTable getRevisionMarkAuthorTable()
          Gets a reference to the revision mark author table, which holds the revision mark authors for the document.
 SavedByTable getSavedByTable()
          Gets a reference to the saved -by table, which holds the save history for the document.
 ShapesTable getShapesTable()
           
 byte[] getTableStream()
           
 TextPieceTable getTextTable()
           
static void main(java.lang.String[] args)
          Takes two arguments, 1) name of the Word file to read in 2) location to write it out at.
 int registerList(HWPFList list)
           
 void write(java.io.OutputStream out)
          Writes out the word file that is represented by an instance of this class.
 
Methods inherited from class org.apache.poi.hwpf.HWPFDocumentCore
getCharacterTable, getFileInformationBlock, getFontTable, getListTables, getParagraphTable, getSectionTable, getStyleSheet, verifyAndBuildPOIFS
 
Methods inherited from class org.apache.poi.POIDocument
copyNodes, createInformationProperties, getDocumentSummaryInformation, getPropertySet, getSummaryInformation, readProperties, writeProperties, writeProperties, writePropertySet
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_cpSplit

protected CPSplitCalculator _cpSplit
And for making sense of CP lengths in the FIB


_tableStream

protected byte[] _tableStream
table stream buffer


_dataStream

protected byte[] _dataStream
data stream buffer


_dop

protected DocumentProperties _dop
Document wide Properties


_cft

protected ComplexFileTable _cft
Contains text of the document wrapped in a obfuscated Word data structure


_tpt

protected TextPieceTable _tpt

_sbt

protected SavedByTable _sbt
Holds the save history for this document.


_rmat

protected RevisionMarkAuthorTable _rmat
Holds the revision mark authors for this document.


_pictures

protected PicturesTable _pictures
Holds pictures table


_fspa

protected FSPATable _fspa
Holds FSBA (shape) information


_dgg

protected EscherRecordHolder _dgg
Escher Drawing Group information


_officeArts

protected ShapesTable _officeArts
Holds Office Art objects

Constructor Detail

HWPFDocument

protected HWPFDocument()

HWPFDocument

public HWPFDocument(java.io.InputStream istream)
             throws java.io.IOException
This constructor loads a Word document from an InputStream.

Parameters:
istream - The InputStream that contains the Word document.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in InputStream.

HWPFDocument

public HWPFDocument(POIFSFileSystem pfilesystem)
             throws java.io.IOException
This constructor loads a Word document from a POIFSFileSystem

Parameters:
pfilesystem - The POIFSFileSystem that contains the Word document.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.

HWPFDocument

public HWPFDocument(DirectoryNode directory,
                    POIFSFileSystem pfilesystem)
             throws java.io.IOException
This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default. Used typically to open embeded documents.

Parameters:
pfilesystem - The POIFSFileSystem that contains the Word document.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.
Method Detail

getTextTable

public TextPieceTable getTextTable()
Specified by:
getTextTable in class HWPFDocumentCore

getCPSplitCalculator

public CPSplitCalculator getCPSplitCalculator()

getDocProperties

public DocumentProperties getDocProperties()

getOverallRange

public Range getOverallRange()
Returns the range that covers all text in the file, including main text, footnotes, headers and comments


getRange

public Range getRange()
Returns the range which covers the whole of the document, but excludes any headers and footers.

Specified by:
getRange in class HWPFDocumentCore

getFootnoteRange

public Range getFootnoteRange()
Returns the range which covers all the Footnotes.


getEndnoteRange

public Range getEndnoteRange()
Returns the range which covers all the Endnotes.


getCommentsRange

public Range getCommentsRange()
Returns the range which covers all the Endnotes.


getHeaderStoryRange

public Range getHeaderStoryRange()
Returns the range which covers all "Header Stories". A header story contains a header, footer, end note separators and footnote separators.


characterLength

public int characterLength()
Returns the character length of a document.

Returns:
the character length of a document

getSavedByTable

public SavedByTable getSavedByTable()
Gets a reference to the saved -by table, which holds the save history for the document.

Returns:
the saved-by table.

getRevisionMarkAuthorTable

public RevisionMarkAuthorTable getRevisionMarkAuthorTable()
Gets a reference to the revision mark author table, which holds the revision mark authors for the document.

Returns:
the saved-by table.

getPicturesTable

public PicturesTable getPicturesTable()
Returns:
PicturesTable object, that is able to extract images from this document

getShapesTable

public ShapesTable getShapesTable()
Returns:
ShapesTable object, that is able to extract office are shapes from this document

write

public void write(java.io.OutputStream out)
           throws java.io.IOException
Writes out the word file that is represented by an instance of this class.

Specified by:
write in class POIDocument
Parameters:
out - The OutputStream to write to.
Throws:
java.io.IOException - If there is an unexpected IOException from the passed in OutputStream.

getDataStream

public byte[] getDataStream()

getTableStream

public byte[] getTableStream()

registerList

public int registerList(HWPFList list)

delete

public void delete(int start,
                   int length)

main

public static void main(java.lang.String[] args)
Takes two arguments, 1) name of the Word file to read in 2) location to write it out at.

Parameters:
args -


Copyright 2010 The Apache Software Foundation or its licensors, as applicable.