public class BlockGroupingCollector extends SimpleCollector
IndexWriter.addDocuments()
or IndexWriter.updateDocuments()
API.
This results in faster performance (~25% faster QPS)
than the two-pass grouping collectors, with the tradeoff
being that the documents in each group must always be
indexed as a block. This collector also fills in
TopGroups.totalGroupCount without requiring the separate
AllGroupsCollector
. However, this collector does
not fill in the groupValue of each group; this field
will always be null.
NOTE: this collector makes no effort to verify the docs were in fact indexed as a block, so it's up to you to ensure this was the case.
See org.apache.lucene.search.grouping
for more
details including a full code example.
Modifier and Type | Class and Description |
---|---|
private class |
BlockGroupingCollector.GroupQueue |
private static class |
BlockGroupingCollector.OneGroup |
private static class |
BlockGroupingCollector.ScoreAndDoc |
Modifier and Type | Field and Description |
---|---|
private int |
bottomSlot |
private FieldComparator<?>[] |
comparators |
private int |
compIDXEnd |
private LeafReaderContext |
currentReaderContext |
private int |
docBase |
private boolean |
groupCompetes |
private int |
groupEndDocID |
private BlockGroupingCollector.GroupQueue |
groupQueue |
private Sort |
groupSort |
private Weight |
lastDocPerGroup |
private DocIdSetIterator |
lastDocPerGroupBits |
private LeafFieldComparator[] |
leafComparators |
private boolean |
needsScores |
private int[] |
pendingSubDocs |
private float[] |
pendingSubScores |
private boolean |
queueFull |
private int[] |
reversed |
private Scorable |
scorer |
private int |
subDocUpto |
private int |
topGroupDoc |
private int |
topNGroups |
private int |
totalGroupCount |
private int |
totalHitCount |
Constructor and Description |
---|
BlockGroupingCollector(Sort groupSort,
int topNGroups,
boolean needsScores,
Weight lastDocPerGroup)
Create the single pass collector.
|
Modifier and Type | Method and Description |
---|---|
void |
collect(int doc)
Called once for every document matching a query, with the unbased document
number.
|
protected void |
doSetNextReader(LeafReaderContext readerContext)
This method is called before collecting
context . |
TopGroups<?> |
getTopGroups(Sort withinGroupSort,
int groupOffset,
int withinGroupOffset,
int maxDocsPerGroup)
Returns the grouped results.
|
private void |
processGroup() |
ScoreMode |
scoreMode()
Indicates what features are required from the scorer.
|
void |
setScorer(Scorable scorer)
Called before successive calls to
LeafCollector.collect(int) . |
getLeafCollector
private int[] pendingSubDocs
private float[] pendingSubScores
private int subDocUpto
private final Sort groupSort
private final int topNGroups
private final Weight lastDocPerGroup
private final boolean needsScores
private final FieldComparator<?>[] comparators
private final LeafFieldComparator[] leafComparators
private final int[] reversed
private final int compIDXEnd
private int bottomSlot
private boolean queueFull
private LeafReaderContext currentReaderContext
private int topGroupDoc
private int totalHitCount
private int totalGroupCount
private int docBase
private int groupEndDocID
private DocIdSetIterator lastDocPerGroupBits
private Scorable scorer
private final BlockGroupingCollector.GroupQueue groupQueue
private boolean groupCompetes
public BlockGroupingCollector(Sort groupSort, int topNGroups, boolean needsScores, Weight lastDocPerGroup)
groupSort
- The Sort
used to sort the
groups. The top sorted document within each group
according to groupSort, determines how that group
sorts against other groups. This must be non-null,
ie, if you want to groupSort by relevance use
Sort.RELEVANCE.topNGroups
- How many top groups to keep.needsScores
- true if the collected documents
require scores, either because relevance is included
in the withinGroupSort or because you plan to pass true
for either getSscores or getMaxScores to getTopGroups(org.apache.lucene.search.Sort, int, int, int)
lastDocPerGroup
- a Weight
that marks the
last document in each group.private void processGroup() throws java.io.IOException
java.io.IOException
public TopGroups<?> getTopGroups(Sort withinGroupSort, int groupOffset, int withinGroupOffset, int maxDocsPerGroup) throws java.io.IOException
NOTE: This collector is unable to compute the groupValue per group so it will always be null. This is normally not a problem, as you can obtain the value just like you obtain other values for each matching document (eg, via stored fields, via DocValues, etc.)
withinGroupSort
- The Sort
used to sort
documents within each group.groupOffset
- Which group to start fromwithinGroupOffset
- Which document to start from
within each groupmaxDocsPerGroup
- How many top documents to keep
within each group.java.io.IOException
public void setScorer(Scorable scorer) throws java.io.IOException
LeafCollector
LeafCollector.collect(int)
. Implementations
that need the score of the current document (passed-in to
LeafCollector.collect(int)
), should save the passed-in Scorer and call
scorer.score() when needed.setScorer
in interface LeafCollector
setScorer
in class SimpleCollector
java.io.IOException
public void collect(int doc) throws java.io.IOException
LeafCollector
Note: The collection of the current segment can be terminated by throwing
a CollectionTerminatedException
. In this case, the last docs of the
current LeafReaderContext
will be skipped and IndexSearcher
will swallow the exception and continue collection with the next leaf.
Note: This is called in an inner search loop. For good search performance,
implementations of this method should not call IndexSearcher.doc(int)
or
IndexReader.document(int)
on every hit.
Doing so can slow searches by an order of magnitude or more.
collect
in interface LeafCollector
collect
in class SimpleCollector
java.io.IOException
protected void doSetNextReader(LeafReaderContext readerContext) throws java.io.IOException
SimpleCollector
context
.doSetNextReader
in class SimpleCollector
java.io.IOException