public final class FlattenGraphFilter extends TokenFilter
SynonymGraphFilter
, into a flat form so that
all nodes form a single linear chain with no side paths. Every
path through the graph touches every node. This is necessary
when indexing a graph token stream, because the index does not
save PositionLengthAttribute
and so it cannot
preserve the graph structure. However, at search time,
query parsers can correctly handle the graph and this token
filter should not be used.
If the graph was not already flat to start, this is likely a lossy process, i.e. it will often cause the graph to accept token sequences it should not, and to reject token sequences it should not.
However, when applying synonyms during indexing, this
is necessary because Lucene already does not index a graph
and so the indexing process is already lossy
(it ignores the PositionLengthAttribute
).
Modifier and Type | Class and Description |
---|---|
private static class |
FlattenGraphFilter.InputNode
Holds all tokens leaving a given input position.
|
private static class |
FlattenGraphFilter.OutputNode
Gathers up merged input positions into a single output position,
only for the current "frontier" of nodes we've seen but can't yet
output because they are not frozen.
|
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
private boolean |
done |
private int |
finalOffset |
private int |
finalPosInc |
private int |
inputFrom
Which input node the last seen token leaves from
|
private RollingBuffer<FlattenGraphFilter.InputNode> |
inputNodes |
private int |
lastOutputFrom |
private int |
lastStartOffset |
private int |
maxLookaheadUsed |
private OffsetAttribute |
offsetAtt |
private int |
outputFrom
We are currently releasing tokens leaving from this output node
|
private RollingBuffer<FlattenGraphFilter.OutputNode> |
outputNodes |
private PositionIncrementAttribute |
posIncAtt |
private PositionLengthAttribute |
posLenAtt |
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
FlattenGraphFilter(TokenStream in) |
Modifier and Type | Method and Description |
---|---|
void |
end()
This method is called by the consumer after the last token has been
consumed, after
TokenStream.incrementToken() returned false
(using the new TokenStream API). |
int |
getMaxLookaheadUsed()
For testing
|
boolean |
incrementToken()
Consumers (i.e.,
IndexWriter ) use this method to advance the stream to
the next token. |
private boolean |
releaseBufferedToken() |
void |
reset()
This method is called by a consumer before it begins consumption using
TokenStream.incrementToken() . |
close
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
private final RollingBuffer<FlattenGraphFilter.InputNode> inputNodes
private final RollingBuffer<FlattenGraphFilter.OutputNode> outputNodes
private final PositionIncrementAttribute posIncAtt
private final PositionLengthAttribute posLenAtt
private final OffsetAttribute offsetAtt
private int inputFrom
private int outputFrom
private boolean done
private int lastOutputFrom
private int finalOffset
private int finalPosInc
private int maxLookaheadUsed
private int lastStartOffset
public FlattenGraphFilter(TokenStream in)
private boolean releaseBufferedToken()
public boolean incrementToken() throws java.io.IOException
TokenStream
IndexWriter
) use this method to advance the stream to
the next token. Implementing classes must implement this method and update
the appropriate AttributeImpl
s with the attributes of the next
token.
The producer must make no assumptions about the attributes after the method
has been returned: the caller may arbitrarily change it. If the producer
needs to preserve the state for subsequent calls, it can use
AttributeSource.captureState()
to create a copy of the current attribute state.
This method is called for every token of a document, so an efficient
implementation is crucial for good performance. To avoid calls to
AttributeSource.addAttribute(Class)
and AttributeSource.getAttribute(Class)
,
references to all AttributeImpl
s that this stream uses should be
retrieved during instantiation.
To ensure that filters and consumers know which attributes are available,
the attributes must be added during instantiation. Filters and consumers
are not required to check for availability of attributes in
TokenStream.incrementToken()
.
incrementToken
in class TokenStream
java.io.IOException
public void end() throws java.io.IOException
TokenFilter
TokenStream.incrementToken()
returned false
(using the new TokenStream
API). Streams implementing the old API
should upgrade to use this feature.
This method can be used to perform any end-of-stream operations, such as setting the final offset of a stream. The final offset of a stream might differ from the offset of the last token eg in case one or more whitespaces followed after the last token, but a WhitespaceTokenizer was used.
Additionally any skipped positions (such as those removed by a stopfilter) can be applied to the position increment, or any adjustment of other attributes where the end-of-stream value may be important.
If you override this method, always call super.end()
.
NOTE:
The default implementation chains the call to the input TokenStream, so
be sure to call super.end()
first when overriding this method.
end
in class TokenFilter
java.io.IOException
- If an I/O error occurspublic void reset() throws java.io.IOException
TokenFilter
TokenStream.incrementToken()
.
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call super.reset()
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer
will
throw IllegalStateException
on further usage).
NOTE:
The default implementation chains the call to the input TokenStream, so
be sure to call super.reset()
when overriding this method.
reset
in class TokenFilter
java.io.IOException
public int getMaxLookaheadUsed()