public final class BytesRefHash
extends java.lang.Object
BytesRefHash
is a special purpose hash-map like data-structure
optimized for BytesRef
instances. BytesRefHash maintains mappings of
byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes
efficiently in continuous storage. The mapping to the id is
encapsulated inside BytesRefHash
and is guaranteed to be increased
for each added BytesRef
.
Note: The maximum capacity BytesRef
instance passed to
add(BytesRef)
must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE
-2.
The internal storage is limited to 2GB total byte storage.
Modifier and Type | Class and Description |
---|---|
static class |
BytesRefHash.BytesStartArray
Manages allocation of the per-term addresses.
|
static class |
BytesRefHash.DirectBytesStartArray
A simple
BytesRefHash.BytesStartArray that tracks
memory allocation using a private Counter
instance. |
static class |
BytesRefHash.MaxBytesLengthExceededException
|
Modifier and Type | Field and Description |
---|---|
(package private) int[] |
bytesStart |
private BytesRefHash.BytesStartArray |
bytesStartArray |
private Counter |
bytesUsed |
private int |
count |
static int |
DEFAULT_CAPACITY |
private int |
hashHalfSize |
private int |
hashMask |
private int |
hashSize |
private int[] |
ids |
private int |
lastCount |
(package private) ByteBlockPool |
pool |
private BytesRef |
scratch1 |
Constructor and Description |
---|
BytesRefHash()
|
BytesRefHash(ByteBlockPool pool)
Creates a new
BytesRefHash |
BytesRefHash(ByteBlockPool pool,
int capacity,
BytesRefHash.BytesStartArray bytesStartArray)
Creates a new
BytesRefHash |
Modifier and Type | Method and Description |
---|---|
int |
add(BytesRef bytes)
Adds a new
BytesRef |
int |
addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef
term.
|
int |
byteStart(int bytesID)
Returns the bytesStart offset into the internally used
ByteBlockPool for the given bytesID |
void |
clear() |
void |
clear(boolean resetPool)
|
void |
close()
Closes the BytesRefHash and releases all internally used memory
|
int[] |
compact()
Returns the ids array in arbitrary order.
|
private int |
doHash(byte[] bytes,
int offset,
int length) |
private boolean |
equals(int id,
BytesRef b) |
int |
find(BytesRef bytes)
Returns the id of the given
BytesRef . |
private int |
findHash(BytesRef bytes) |
BytesRef |
get(int bytesID,
BytesRef ref)
Populates and returns a
BytesRef with the bytes for the given
bytesID. |
private void |
rehash(int newSize,
boolean hashOnData)
Called when hash is too small (
> 50% occupied) or too large (< 20%
occupied). |
void |
reinit()
reinitializes the
BytesRefHash after a previous clear()
call. |
private boolean |
shrink(int targetSize) |
int |
size()
Returns the number of
BytesRef values in this BytesRefHash . |
int[] |
sort()
Returns the values array sorted by the referenced byte values.
|
public static final int DEFAULT_CAPACITY
final ByteBlockPool pool
int[] bytesStart
private final BytesRef scratch1
private int hashSize
private int hashHalfSize
private int hashMask
private int count
private int lastCount
private int[] ids
private final BytesRefHash.BytesStartArray bytesStartArray
private Counter bytesUsed
public BytesRefHash()
public BytesRefHash(ByteBlockPool pool)
BytesRefHash
public BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
BytesRefHash
public int size()
BytesRef
values in this BytesRefHash
.BytesRef
values in this BytesRefHash
.public BytesRef get(int bytesID, BytesRef ref)
BytesRef
with the bytes for the given
bytesID.
Note: the given bytesID must be a positive integer less than the current
size (size()
)
bytesID
- the idref
- the BytesRef
to populatepublic int[] compact()
size()
- 1
Note: This is a destructive operation. clear()
must be called in
order to reuse this BytesRefHash
instance.
public int[] sort()
Note: This is a destructive operation. clear()
must be called in
order to reuse this BytesRefHash
instance.
private boolean equals(int id, BytesRef b)
private boolean shrink(int targetSize)
public void clear(boolean resetPool)
public void clear()
public void close()
public int add(BytesRef bytes)
BytesRef
bytes
- the bytes to hash(-(id)-1)
. This guarantees
that the return value will always be >= 0 if the given bytes
haven't been hashed before.BytesRefHash.MaxBytesLengthExceededException
- if the given bytes are > 2 +
ByteBlockPool.BYTE_BLOCK_SIZE
public int find(BytesRef bytes)
BytesRef
.bytes
- the bytes to look for-1
if there is no mapping for the
given bytes.private int findHash(BytesRef bytes)
public int addByPoolOffset(int offset)
private void rehash(int newSize, boolean hashOnData)
> 50%
occupied) or too large (< 20%
occupied).private int doHash(byte[] bytes, int offset, int length)
public void reinit()
BytesRefHash
after a previous clear()
call. If clear()
has not been called previously this method has no
effect.public int byteStart(int bytesID)
ByteBlockPool
for the given bytesIDbytesID
- the id to look upByteBlockPool
for the given id