public abstract class Tokenizer extends TokenStream
This is an abstract class; subclasses must override TokenStream.incrementToken()
NOTE: Subclasses overriding TokenStream.incrementToken()
must
call AttributeSource.clearAttributes()
before
setting attributes.
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
private static java.io.Reader |
ILLEGAL_STATE_READER |
protected java.io.Reader |
input
The text source for this Tokenizer.
|
private java.io.Reader |
inputPending
Pending reader: not actually assigned to input until reset()
|
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Modifier | Constructor and Description |
---|---|
protected |
Tokenizer()
Construct a tokenizer with no input, awaiting a call to
setReader(java.io.Reader)
to provide input. |
protected |
Tokenizer(AttributeFactory factory)
Construct a tokenizer with no input, awaiting a call to
setReader(java.io.Reader) to
provide input. |
Modifier and Type | Method and Description |
---|---|
void |
close()
Releases resources associated with this stream.
|
protected int |
correctOffset(int currentOff)
Return the corrected offset.
|
void |
reset()
This method is called by a consumer before it begins consumption using
TokenStream.incrementToken() . |
void |
setReader(java.io.Reader input)
Expert: Set a new reader on the Tokenizer.
|
(package private) void |
setReaderTestPoint() |
end, incrementToken
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
protected java.io.Reader input
private java.io.Reader inputPending
private static final java.io.Reader ILLEGAL_STATE_READER
protected Tokenizer()
setReader(java.io.Reader)
to provide input.protected Tokenizer(AttributeFactory factory)
setReader(java.io.Reader)
to
provide input.factory
- attribute factory.public void close() throws java.io.IOException
If you override this method, always call super.close()
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer
will
throw IllegalStateException
on reuse).
NOTE:
The default implementation closes the input Reader, so
be sure to call super.close()
when overriding this method.
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class TokenStream
java.io.IOException
protected final int correctOffset(int currentOff)
input
is a CharFilter
subclass
this method calls CharFilter.correctOffset(int)
, else returns currentOff
.currentOff
- offset as seen in the outputCharFilter.correctOffset(int)
public final void setReader(java.io.Reader input)
public void reset() throws java.io.IOException
TokenStream
TokenStream.incrementToken()
.
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call super.reset()
, otherwise
some internal state will not be correctly reset (e.g., Tokenizer
will
throw IllegalStateException
on further usage).
reset
in class TokenStream
java.io.IOException
void setReaderTestPoint()