public class StopFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
StopFilter
.
<fieldType name="text_stop" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" format="wordset" </analyzer> </fieldType>
All attributes are optional:
ignoreCase
defaults to false
words
should be the name of a stopwords file to parse, if not
specified the factory will use EnglishAnalyzer.ENGLISH_STOP_WORDS_SET
format
defines how the words
file will be parsed,
and defaults to wordset
. If words
is not specified,
then format
must not be specified.
The valid values for the format
option are:
wordset
- This is the default format, which supports one word per
line (including any intra-word whitespace) and allows whole line comments
beginning with the "#" character. Blank lines are ignored. See
WordlistLoader.getLines
for details.
snowball
- This format allows for multiple words specified on each
line, and trailing comments may be specified using the vertical line ("|").
Blank lines are ignored. See
WordlistLoader.getSnowballWordSet
for details.
Modifier and Type | Field and Description |
---|---|
private java.lang.String |
format |
static java.lang.String |
FORMAT_SNOWBALL |
static java.lang.String |
FORMAT_WORDSET |
private boolean |
ignoreCase |
private java.lang.String |
stopWordFiles |
private CharArraySet |
stopWords |
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
StopFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new StopFilterFactory
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
create(TokenStream input)
Transform the specified input TokenStream
|
CharArraySet |
getStopWords() |
void |
inform(ResourceLoader loader)
Initializes this component with the provided ResourceLoader
(used for loading classes, files, etc).
|
boolean |
isIgnoreCase() |
availableTokenFilters, forName, lookupClass, normalize, reloadTokenFilters
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
public static final java.lang.String FORMAT_WORDSET
public static final java.lang.String FORMAT_SNOWBALL
private CharArraySet stopWords
private final java.lang.String stopWordFiles
private final java.lang.String format
private final boolean ignoreCase
public StopFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)
public void inform(ResourceLoader loader) throws java.io.IOException
ResourceLoaderAware
inform
in interface ResourceLoaderAware
java.io.IOException
public boolean isIgnoreCase()
public CharArraySet getStopWords()
public TokenStream create(TokenStream input)
TokenFilterFactory
create
in class TokenFilterFactory