public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
HyphenationCompoundWordTokenFilter
.
This factory accepts the following parameters:
hyphenator
(mandatory): path to the FOP xml hyphenation pattern.
See http://offo.sourceforge.net/hyphenation/.
encoding
(optional): encoding of the xml hyphenation file. defaults to UTF-8.
dictionary
(optional): dictionary of words. defaults to no dictionary.
minWordSize
(optional): minimal word length that gets decomposed. defaults to 5.
minSubwordSize
(optional): minimum length of subwords. defaults to 2.
maxSubwordSize
(optional): maximum length of subwords. defaults to 15.
onlyLongestMatch
(optional): if true, adds only the longest matching subword
to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>
HyphenationCompoundWordTokenFilter
Modifier and Type | Field and Description |
---|---|
private java.lang.String |
dictFile |
private CharArraySet |
dictionary |
private java.lang.String |
encoding |
private java.lang.String |
hypFile |
private HyphenationTree |
hyphenator |
private int |
maxSubwordSize |
private int |
minSubwordSize |
private int |
minWordSize |
private boolean |
onlyLongestMatch |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
HyphenationCompoundWordTokenFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new HyphenationCompoundWordTokenFilterFactory
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
create(TokenStream input)
Transform the specified input TokenStream
|
void |
inform(ResourceLoader loader)
Initializes this component with the provided ResourceLoader
(used for loading classes, files, etc).
|
availableTokenFilters, forName, lookupClass, normalize, reloadTokenFilters
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
private CharArraySet dictionary
private HyphenationTree hyphenator
private final java.lang.String dictFile
private final java.lang.String hypFile
private final java.lang.String encoding
private final int minWordSize
private final int minSubwordSize
private final int maxSubwordSize
private final boolean onlyLongestMatch
public HyphenationCompoundWordTokenFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)
public void inform(ResourceLoader loader) throws java.io.IOException
ResourceLoaderAware
inform
in interface ResourceLoaderAware
java.io.IOException
public TokenFilter create(TokenStream input)
TokenFilterFactory
create
in class TokenFilterFactory