public final class HMMChineseTokenizerFactory extends TokenizerFactory
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add
a WordDelimiterFilter after to remove these (with concatenate off), or use the
SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
HMMChineseTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new HMMChineseTokenizerFactory
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactory
|
availableTokenizers, create, forName, lookupClass, reloadTokenizers
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
public HMMChineseTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
public Tokenizer create(AttributeFactory factory)
TokenizerFactory
create
in class TokenizerFactory