final class RopeByteString extends ByteString
ByteStrings
formed by concatenation of other
ByteStrings, without copying the data in the pieces. The concatenation is
represented as a tree whose leaf nodes are each a
ByteString.LeafByteString
.
Most of the operation here is inspired by the now-famous paper BAP95 Ropes: an Alternative to Strings hans-j. boehm, russ atkinson and michael plass
The algorithms described in the paper have been implemented for character
strings in com.google.common.string.Rope
and in the c++ class cord.cc
.
Fundamentally the Rope algorithm represents the collection of pieces as a binary tree. BAP95 uses a Fibonacci bound relating depth to a minimum sequence length, sequences that are too short relative to their depth cause a tree rebalance. More precisely, a tree of depth d is "balanced" in the terminology of BAP95 if its length is at least F(d+2), where F(n) is the n-the Fibonacci number. Thus for depths 0, 1, 2, 3, 4, 5,... we have minimum lengths 1, 2, 3, 5, 8, 13,...
Modifier and Type | Class and Description |
---|---|
private static class |
RopeByteString.Balancer
This class implements the balancing algorithm of BAP95.
|
private static class |
RopeByteString.PieceIterator
This class is a continuable tree traversal, which keeps the state
information which would exist on the stack in a recursive traversal instead
on a stack of "Bread Crumbs".
|
private class |
RopeByteString.RopeInputStream
This class is the
RopeByteString equivalent for
ByteArrayInputStream . |
ByteString.ByteIterator, ByteString.CodedBuilder, ByteString.LeafByteString, ByteString.Output
Modifier and Type | Field and Description |
---|---|
private ByteString |
left |
private int |
leftLength |
private static int[] |
minLengthByDepth
BAP95.
|
private ByteString |
right |
private static long |
serialVersionUID |
private int |
totalLength |
private int |
treeDepth |
CONCATENATE_BY_COPY_SIZE, EMPTY, MAX_READ_FROM_CHUNK_SIZE, MIN_READ_FROM_CHUNK_SIZE
Modifier | Constructor and Description |
---|---|
private |
RopeByteString(ByteString left,
ByteString right)
Create a new RopeByteString, which can be thought of as a new tree node, by
recording references to the two given strings.
|
Modifier and Type | Method and Description |
---|---|
java.nio.ByteBuffer |
asReadOnlyByteBuffer()
Constructs a read-only
java.nio.ByteBuffer whose content
is equal to the contents of this byte string. |
java.util.List<java.nio.ByteBuffer> |
asReadOnlyByteBufferList()
Constructs a list of read-only
java.nio.ByteBuffer objects
such that the concatenation of their contents is equal to the contents
of this byte string. |
byte |
byteAt(int index)
Gets the byte at the given index.
|
(package private) static ByteString |
concatenate(ByteString left,
ByteString right)
Concatenate the given strings while performing various optimizations to
slow the growth rate of tree depth and tree node count.
|
private static ByteString |
concatenateBytes(ByteString left,
ByteString right)
Concatenates two strings by copying data values.
|
void |
copyTo(java.nio.ByteBuffer target)
Copies bytes into a ByteBuffer.
|
protected void |
copyToInternal(byte[] target,
int sourceOffset,
int targetOffset,
int numberToCopy)
Internal (package private) implementation of
ByteString.copyTo(byte[],int,int,int) . |
boolean |
equals(java.lang.Object other) |
private boolean |
equalsFragments(ByteString other)
Determines if this string is equal to another of the same length by
iterating over the leaf nodes.
|
protected int |
getTreeDepth()
Return the depth of the tree representing this
ByteString , if any,
whose root is this node. |
protected boolean |
isBalanced()
Determines if the tree is balanced according to BAP95, which means the tree
is flat-enough with respect to the bounds.
|
boolean |
isValidUtf8()
Tells whether this
ByteString represents a well-formed UTF-8
byte sequence, such that the original bytes can be converted to a
String object and then round tripped back to bytes without loss. |
CodedInputStream |
newCodedInput()
Creates a
CodedInputStream which can be used to read the bytes. |
java.io.InputStream |
newInput()
Creates an
InputStream which can be used to read the bytes. |
(package private) static RopeByteString |
newInstanceForTest(ByteString left,
ByteString right)
Create a new RopeByteString for testing only while bypassing all the
defenses of
concatenate(ByteString, ByteString) . |
protected int |
partialHash(int h,
int offset,
int length)
Compute the hash across the value bytes starting with the given hash, and
return the result.
|
protected int |
partialIsValidUtf8(int state,
int offset,
int length)
Tells whether the given byte sequence is a well-formed, malformed, or
incomplete UTF-8 byte sequence.
|
private void |
readObject(java.io.ObjectInputStream in) |
int |
size()
Gets the number of bytes.
|
ByteString |
substring(int beginIndex,
int endIndex)
Takes a substring of this one.
|
protected java.lang.String |
toStringInternal(java.nio.charset.Charset charset)
Constructs a new
String by decoding the bytes using the
specified charset. |
(package private) java.lang.Object |
writeReplace() |
(package private) void |
writeTo(ByteOutput output)
Writes this
ByteString to the provided ByteOutput . |
void |
writeTo(java.io.OutputStream outputStream)
Writes a copy of the contents of this byte string to the specified output stream argument.
|
(package private) void |
writeToInternal(java.io.OutputStream out,
int sourceOffset,
int numberToWrite)
Internal version of
ByteString.writeTo(OutputStream,int,int) that assumes
all error checking has already been done. |
checkIndex, checkRange, concat, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFromUtf8, copyTo, copyTo, endsWith, hashCode, isEmpty, iterator, newCodedBuilder, newOutput, newOutput, peekCachedHashCode, readFrom, readFrom, readFrom, startsWith, substring, toByteArray, toString, toString, toString, toStringUtf8, wrap, wrap, wrap, writeTo
private static final int[] minLengthByDepth
RopeByteString
of
depth n is "balanced", i.e flat enough, if its length is at least Fn+2,
e.g. a "balanced" RopeByteString
of depth 1 must have length at
least 2, of depth 4 must have length >= 8, etc.
There's nothing special about using the Fibonacci numbers for this, but they are a reasonable sequence for encapsulating the idea that we are OK with longer strings being encoded in deeper binary trees.
For 32-bit integers, this array has length 46.
private final int totalLength
private final ByteString left
private final ByteString right
private final int leftLength
private final int treeDepth
private static final long serialVersionUID
private RopeByteString(ByteString left, ByteString right)
left
- string on the left of this node, should have size() >
0
right
- string on the right of this node, should have size() >
0
static ByteString concatenate(ByteString left, ByteString right)
ByteString.LeafByteString
or a
RopeByteString
depending on which optimizations, if any, were
applied.
Small pieces of length less than ByteString.CONCATENATE_BY_COPY_SIZE
may be copied by value here, as in
BAP95. Large pieces are referenced without copy.
left
- string on the leftright
- string on the rightprivate static ByteString concatenateBytes(ByteString left, ByteString right)
left
- string on the leftright
- string on the rightstatic RopeByteString newInstanceForTest(ByteString left, ByteString right)
concatenate(ByteString, ByteString)
. This allows
testing trees of specific structure. We are also able to insert empty
leaves, though these are dis-allowed, so that we can make sure the
implementation can withstand their presence.left
- string on the left of this noderight
- string on the right of this nodepublic byte byteAt(int index)
ArrayIndexOutOfBoundsException
for backwards-compatibility
reasons although it would more properly be IndexOutOfBoundsException
.byteAt
in class ByteString
index
- index of bytejava.lang.ArrayIndexOutOfBoundsException
- index
is < 0 or >= sizepublic int size()
ByteString
size
in class ByteString
protected int getTreeDepth()
ByteString
ByteString
, if any,
whose root is this node. If this is a leaf node, return 0.getTreeDepth
in class ByteString
protected boolean isBalanced()
isBalanced
in class ByteString
public ByteString substring(int beginIndex, int endIndex)
Substrings of length < 2
should result in at most a single
recursive call chain, terminating at a leaf node. Thus the result will be a
ByteString.LeafByteString
.
substring
in class ByteString
beginIndex
- start at this indexendIndex
- the last character is the one before this indexprotected void copyToInternal(byte[] target, int sourceOffset, int targetOffset, int numberToCopy)
ByteString
ByteString.copyTo(byte[],int,int,int)
.
It assumes that all error checking has already been performed and that
numberToCopy > 0
.copyToInternal
in class ByteString
public void copyTo(java.nio.ByteBuffer target)
ByteString
copyTo
in class ByteString
target
- ByteBuffer to copy into.public java.nio.ByteBuffer asReadOnlyByteBuffer()
ByteString
java.nio.ByteBuffer
whose content
is equal to the contents of this byte string.
The result uses the same backing array as the byte string, if possible.asReadOnlyByteBuffer
in class ByteString
public java.util.List<java.nio.ByteBuffer> asReadOnlyByteBufferList()
ByteString
java.nio.ByteBuffer
objects
such that the concatenation of their contents is equal to the contents
of this byte string. The result uses the same backing arrays as the
byte string.
By returning a list, implementations of this method may be able to avoid copying even when there are multiple backing arrays.
asReadOnlyByteBufferList
in class ByteString
public void writeTo(java.io.OutputStream outputStream) throws java.io.IOException
ByteString
writeTo
in class ByteString
outputStream
- the output stream to which to write the data.java.io.IOException
- if an I/O error occurs.void writeToInternal(java.io.OutputStream out, int sourceOffset, int numberToWrite) throws java.io.IOException
ByteString
ByteString.writeTo(OutputStream,int,int)
that assumes
all error checking has already been done.writeToInternal
in class ByteString
java.io.IOException
void writeTo(ByteOutput output) throws java.io.IOException
ByteString
ByteString
to the provided ByteOutput
. Calling
this method may result in multiple operations on the target ByteOutput
.
This method may expose internal backing buffers of the ByteString
to the ByteOutput
in order to avoid additional copying overhead. It would be possible for a malicious
ByteOutput
to corrupt the ByteString
. Use with caution!
writeTo
in class ByteString
output
- the output target to receive the bytesjava.io.IOException
- if an I/O error occursUnsafeByteOperations.unsafeWriteTo(ByteString, ByteOutput)
protected java.lang.String toStringInternal(java.nio.charset.Charset charset)
ByteString
String
by decoding the bytes using the
specified charset.toStringInternal
in class ByteString
charset
- encode using this charsetpublic boolean isValidUtf8()
ByteString
ByteString
represents a well-formed UTF-8
byte sequence, such that the original bytes can be converted to a
String object and then round tripped back to bytes without loss.
More precisely, returns true
whenever:
Arrays.equals(byteString.toByteArray(),
new String(byteString.toByteArray(), "UTF-8").getBytes("UTF-8"))
This method returns false
for "overlong" byte sequences,
as well as for 3-byte sequences that would map to a surrogate
character, in accordance with the restricted definition of UTF-8
introduced in Unicode 3.1. Note that the UTF-8 decoder included in
Oracle's JDK has been modified to also reject "overlong" byte
sequences, but (as of 2011) still accepts 3-byte surrogate
character byte sequences.
See the Unicode Standard,
Table 3-6. UTF-8 Bit Distribution,
Table 3-7. Well Formed UTF-8 Byte Sequences.
isValidUtf8
in class ByteString
ByteString
are a
well-formed UTF-8 byte sequenceprotected int partialIsValidUtf8(int state, int offset, int length)
ByteString
ByteString
segments.partialIsValidUtf8
in class ByteString
state
- either 0
(if this is the initial decoding operation)
or the value returned from a call to a partial decoding method for the
previous bytesoffset
- offset of the first byte to checklength
- number of bytes to check-1
if the partial byte sequence is definitely malformed,
0
if it is well-formed (no additional input needed), or, if the
byte sequence is "incomplete", i.e. apparently terminated in the middle of
a character, an opaque integer "state" value containing enough information
to decode the character when passed to a subsequent invocation of a
partial decoding method.public boolean equals(java.lang.Object other)
equals
in class ByteString
private boolean equalsFragments(ByteString other)
other
- string of the same length as this oneprotected int partialHash(int h, int offset, int length)
ByteString
partialHash
in class ByteString
h
- starting hash valueoffset
- offset into this value to start looking at data valueslength
- number of data values to include in the hash computationpublic CodedInputStream newCodedInput()
ByteString
CodedInputStream
which can be used to read the bytes.
Using this is often more efficient than creating a CodedInputStream
that wraps the result of ByteString.newInput()
.newCodedInput
in class ByteString
public java.io.InputStream newInput()
ByteString
InputStream
which can be used to read the bytes.
The InputStream
returned by this method is guaranteed to be
completely non-blocking. The method InputStream.available()
returns the number of bytes remaining in the stream. The methods
InputStream.read(byte[])
, InputStream.read(byte[],int,int)
and InputStream.skip(long)
will read/skip as many bytes as are
available. The method InputStream.markSupported()
returns
true
.
The methods in the returned InputStream
might not be
thread safe.
newInput
in class ByteString
java.lang.Object writeReplace()
private void readObject(java.io.ObjectInputStream in) throws java.io.IOException
java.io.IOException