public class CBZip2InputStream extends InputStream implements BZip2Constants
The decompression requires large amounts of memory. Thus you should call the
close()
method as soon as possible, to force
CBZip2InputStream to release the allocated memory. See
CBZip2OutputStream
for information about memory
usage.
CBZip2InputStream reads bytes from the compressed source stream via
the single byte read()
method exclusively.
Thus you should consider to use a buffered source stream.
This Ant code was enhanced so that it can de-compress blocks of bzip2 data. Current position in the stream is an important statistic for Hadoop. For example in LineRecordReader, we solely depend on the current position in the stream to know about the progess. The notion of position becomes complicated for compressed files. The Hadoop splitting is done in terms of compressed file. But a compressed file deflates to a large amount of data. So we have handled this problem in the following way. On object creation time, we find the next block start delimiter. Once such a marker is found, the stream stops there (we discard any read compressed data in this process) and the position is updated (i.e. the caller of this class will find out the stream location). At this point we are ready for actual reading (i.e. decompression) of data. The subsequent read calls give out data. The position is updated when the caller of this class has read off the current block + 1 bytes. In between the block reading, position is not updated. (We can only update the postion on block boundaries).
Instances of this class are not threadsafe.
Modifier and Type | Class and Description |
---|---|
static class |
CBZip2InputStream.STATE
A state machine to keep track of current state of the de-coder
|
Modifier and Type | Field and Description |
---|---|
static long |
BLOCK_DELIMITER |
static long |
EOS_DELIMITER |
baseBlockSize, END_OF_BLOCK, END_OF_STREAM, G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, N_GROUPS, N_ITERS, NUM_OVERSHOOT_BYTES, rNums, RUNA, RUNB
Constructor and Description |
---|
CBZip2InputStream(InputStream in) |
CBZip2InputStream(InputStream in,
SplittableCompressionCodec.READ_MODE readMode)
Constructs a new CBZip2InputStream which decompresses bytes read from the
specified stream.
|
Modifier and Type | Method and Description |
---|---|
void |
close() |
long |
getProcessedByteCount()
This method reports the processed bytes so far.
|
static long |
numberOfBytesTillNextMarker(InputStream in)
Returns the number of bytes between the current stream position
and the immediate next BZip2 block marker.
|
int |
read() |
int |
read(byte[] dest,
int offs,
int len)
In CONTINOUS reading mode, this read method starts from the
start of the compressed stream and end at the end of file by
emitting un-compressed data.
|
protected void |
reportCRCError() |
boolean |
skipToNextMarker(long marker,
int markerBitLength)
This method tries to find the marker (passed to it as the first parameter)
in the stream.
|
protected void |
updateProcessedByteCount(int count)
This method keeps track of raw processed compressed
bytes.
|
void |
updateReportedByteCount(int count)
This method is called by the client of this
class in case there are any corrections in
the stream position.
|
available, mark, markSupported, read, reset, skip
public static final long BLOCK_DELIMITER
public static final long EOS_DELIMITER
public CBZip2InputStream(InputStream in, SplittableCompressionCodec.READ_MODE readMode) throws IOException
Although BZip2 headers are marked with the magic "Bz" this constructor expects the next byte in the stream to be the first one after the magic. Thus callers have to skip the first two bytes. Otherwise this constructor will throw an exception.
IOException
- if the stream content is malformed or an I/O error occurs.NullPointerException
- if in == nullpublic CBZip2InputStream(InputStream in) throws IOException
IOException
public long getProcessedByteCount()
protected void updateProcessedByteCount(int count)
count
- count is the number of bytes to be
added to raw processed bytespublic void updateReportedByteCount(int count)
count
- count bytes are added to the reported bytespublic boolean skipToNextMarker(long marker, int markerBitLength) throws IOException, IllegalArgumentException
marker
- The bit pattern to be found in the streammarkerBitLength
- No of bits in the markerIOException
IllegalArgumentException
- if marketBitLength is greater than 63protected void reportCRCError() throws IOException
IOException
public static long numberOfBytesTillNextMarker(InputStream in) throws IOException
in
- The InputStreamIOException
public int read() throws IOException
read
in class InputStream
IOException
public int read(byte[] dest, int offs, int len) throws IOException
read
in class InputStream
IOException
- if the stream content is malformed or an I/O error occurs.public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class InputStream
IOException
Copyright © 2013 Apache Software Foundation. All rights reserved.