org.apache.lucene.analysis

Class CharTokenizer

public abstract class CharTokenizer extends Tokenizer

An abstract base class for simple, character-oriented tokenizers.
Constructor Summary
CharTokenizer(Reader input)
Method Summary
protected abstract booleanisTokenChar(char c)
Returns true iff a character should be included in a token.
Tokennext()
Returns the next token in the stream, or null at EOS.
protected charnormalize(char c)
Called on each token character to normalize it before it is added to the token.

Constructor Detail

CharTokenizer

public CharTokenizer(Reader input)

Method Detail

isTokenChar

protected abstract boolean isTokenChar(char c)
Returns true iff a character should be included in a token. This tokenizer generates as tokens adjacent sequences of characters which satisfy this predicate. Characters for which this is false are used to define token boundaries and are not included in tokens.

next

public final Token next()
Returns the next token in the stream, or null at EOS.

normalize

protected char normalize(char c)
Called on each token character to normalize it before it is added to the token. The default implementation does nothing. Subclasses may use this to, e.g., lowercase tokens.
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.