public final class CommonGramsFilter extends TokenFilter
PositionIncrementAttribute.setPositionIncrement(int)
. Bigrams have a type
of GRAM_TYPE
Example:
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
private java.lang.StringBuilder |
buffer |
private CharArraySet |
commonWords |
static java.lang.String |
GRAM_TYPE |
private int |
lastStartOffset |
private boolean |
lastWasCommon |
private OffsetAttribute |
offsetAttribute |
private PositionIncrementAttribute |
posIncAttribute |
private PositionLengthAttribute |
posLenAttribute |
private AttributeSource.State |
savedState |
private static char |
SEPARATOR |
private CharTermAttribute |
termAttribute |
private TypeAttribute |
typeAttribute |
input
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
CommonGramsFilter(TokenStream input,
CharArraySet commonWords)
Construct a token stream filtering the given input using a Set of common
words to create bigrams.
|
Modifier and Type | Method and Description |
---|---|
private void |
gramToken()
Constructs a compound token.
|
boolean |
incrementToken()
Inserts bigrams for common words into a token stream.
|
private boolean |
isCommon()
Determines if the current token is a common term
|
void |
reset()
This method is called by a consumer before it begins consumption using
TokenStream.incrementToken() . |
private void |
saveTermBuffer()
Saves this information to form the left part of a gram
|
close, end
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public static final java.lang.String GRAM_TYPE
private static final char SEPARATOR
private final CharArraySet commonWords
private final java.lang.StringBuilder buffer
private final CharTermAttribute termAttribute
private final OffsetAttribute offsetAttribute
private final TypeAttribute typeAttribute
private final PositionIncrementAttribute posIncAttribute
private final PositionLengthAttribute posLenAttribute
private int lastStartOffset
private boolean lastWasCommon
private AttributeSource.State savedState
public CommonGramsFilter(TokenStream input, CharArraySet commonWords)
input
- TokenStream input in filter chaincommonWords
- The set of common words.public boolean incrementToken() throws java.io.IOException
incrementToken
in class TokenStream
java.io.IOException
public void reset() throws java.io.IOException
TokenFilter
TokenStream.incrementToken()
.
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call super.reset()
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer
will
throw IllegalStateException
on further usage).
NOTE:
The default implementation chains the call to the input TokenStream, so
be sure to call super.reset()
when overriding this method.
reset
in class TokenFilter
java.io.IOException
private boolean isCommon()
true
if the current token is a common term, false
otherwiseprivate void saveTermBuffer()
private void gramToken()