public class BKDTreeDocValuesFormat extends DocValuesFormat
DocValuesFormat
to efficiently index geo-spatial lat/lon points
from BKDPointField
for fast bounding-box (BKDPointInBBoxQuery
)
and polygon (BKDPointInPolygonQuery
) queries.
This wraps Lucene50DocValuesFormat
, but saves its own BKD tree
structures to disk for fast query-time intersection. See this paper
for details.
The BKD tree slices up 2D (lat/lon) space into smaller and smaller rectangles, until the smallest rectangles have approximately between X/2 and X (X default is 1024) points in them, at which point such leaf cells are written as a block to disk, while the index tree structure records how space was sub-divided is loaded into HEAP at search time. At search time, the tree is recursed based on whether each of left or right child overlap with the query shape, and once a leaf block is reached, all documents in that leaf block are collected if the cell is fully enclosed by the query shape, or filtered and then collected, if not.
The index is also quite compact, because docs only appear once in the tree (no "prefix terms").
In addition to the files written by Lucene50DocValuesFormat
, this format writes:
The disk format is experimental and free to change suddenly, and this code likely has new and exciting bugs!
Constructor and Description |
---|
BKDTreeDocValuesFormat()
Default constructor
|
BKDTreeDocValuesFormat(int maxPointsInLeafNode,
int maxPointsSortInHeap)
Creates this with custom configuration.
|
Modifier and Type | Method and Description |
---|---|
DocValuesConsumer |
fieldsConsumer(SegmentWriteState state)
Returns a
DocValuesConsumer to write docvalues to the
index. |
DocValuesProducer |
fieldsProducer(SegmentReadState state)
Returns a
DocValuesProducer to read docvalues from the index. |
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
public BKDTreeDocValuesFormat()
public BKDTreeDocValuesFormat(int maxPointsInLeafNode, int maxPointsSortInHeap)
maxPointsInLeafNode
- Maximum number of points in each leaf cell. Smaller values create a deeper tree with larger in-heap index and possibly
faster searching. The default is 1024.maxPointsSortInHeap
- Maximum number of points where in-heap sort can be used. When the number of points exceeds this, a (slower)
offline sort is used. The default is 128 * 1024.public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException
DocValuesFormat
DocValuesConsumer
to write docvalues to the
index.fieldsConsumer
in class DocValuesFormat
IOException
public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOException
DocValuesFormat
DocValuesProducer
to read docvalues from the index.
NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
fieldsProducer
in class DocValuesFormat
IOException
Copyright © 2000–2015 The Apache Software Foundation. All rights reserved.