Package | Description |
---|---|
org.apache.lucene.analysis |
Text analysis.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.bn |
Analyzer for Bengali Language.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.ca |
Analyzer for Catalan.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
|
org.apache.lucene.analysis.ckb |
Analyzer for Sorani Kurdish.
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.commongrams |
Construct n-grams for frequently occurring terms and phrases.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.core |
Basic, general-purpose analysis components.
|
org.apache.lucene.analysis.custom |
A general-purpose Analyzer that can be created with a builder-style API.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.da |
Analyzer for Danish.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.eu |
Analyzer for Basque.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analyzer for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.hy |
Analyzer for Armenian.
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analyzer for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.lt |
Analyzer for Lithuanian.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.minhash |
MinHash filtering (for LSH).
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous Tokenstreams.
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.path |
Analysis components for path-like strings such as filenames.
|
org.apache.lucene.analysis.pattern |
Set of components for pattern-based (regex) analysis.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ro |
Analyzer for Romanian.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters.
|
org.apache.lucene.analysis.sinks | |
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.sr |
Analyzer for Serbian.
|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29. |
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.analysis.util |
Utility functions for text analysis.
|
org.apache.lucene.analysis.wikipedia |
Tokenizer that is aware of Wikipedia syntax.
|
org.apache.lucene.codecs |
Codecs API: API for customization of the encoding and structure of the index.
|
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
org.apache.lucene.search |
Code to search indices.
|
org.apache.lucene.search.highlight |
Highlighting search terms.
|
org.apache.lucene.search.uhighlight |
The UnifiedHighlighter -- a flexible highlighter that can get offsets from postings, term vectors, or analysis.
|
org.apache.lucene.util |
Some utility classes.
|
org.apache.lucene.util.graph |
Utility classes for working with token streams as graphs.
|
Modifier and Type | Class and Description |
---|---|
private static class |
Analyzer.StringTokenStream |
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
class |
GraphTokenFilter
An abstract TokenFilter that exposes its input stream as a graph
Call
GraphTokenFilter.incrementBaseToken() to move the root of the graph to the next
position in the TokenStream, GraphTokenFilter.incrementGraphToken() to move along
the current graph, and GraphTokenFilter.incrementGraph() to reset to the next graph
based at the current root. |
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
TokenFilter.input
The source of tokens for this filter.
|
protected TokenStream |
Analyzer.TokenStreamComponents.sink
Sink tokenstream, such as the outer tokenfilter decorating
the chain.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
Analyzer.TokenStreamComponents.getTokenStream()
Returns the sink
TokenStream |
protected TokenStream |
Analyzer.normalize(java.lang.String fieldName,
TokenStream in)
Wrap the given
TokenStream in order to apply normalization filters. |
protected TokenStream |
AnalyzerWrapper.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
Analyzer.tokenStream(java.lang.String fieldName,
java.io.Reader reader)
Returns a TokenStream suitable for
fieldName , tokenizing
the contents of reader . |
TokenStream |
Analyzer.tokenStream(java.lang.String fieldName,
java.lang.String text)
Returns a TokenStream suitable for
fieldName , tokenizing
the contents of text . |
protected TokenStream |
DelegatingAnalyzerWrapper.wrapTokenStreamForNormalization(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
AnalyzerWrapper.wrapTokenStreamForNormalization(java.lang.String fieldName,
TokenStream in)
Wraps / alters the given TokenStream for normalization purposes, taken
from the wrapped Analyzer, to form new components.
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
Analyzer.normalize(java.lang.String fieldName,
TokenStream in)
Wrap the given
TokenStream in order to apply normalization filters. |
protected TokenStream |
AnalyzerWrapper.normalize(java.lang.String fieldName,
TokenStream in) |
Automaton |
TokenStreamToAutomaton.toAutomaton(TokenStream in)
Pulls the graph (including
PositionLengthAttribute ) from the provided TokenStream , and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term. |
protected TokenStream |
DelegatingAnalyzerWrapper.wrapTokenStreamForNormalization(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
AnalyzerWrapper.wrapTokenStreamForNormalization(java.lang.String fieldName,
TokenStream in)
Wraps / alters the given TokenStream for normalization purposes, taken
from the wrapped Analyzer, to form new components.
|
Constructor and Description |
---|
CachingTokenFilter(TokenStream input)
Create a new CachingTokenFilter around
input . |
FilteringTokenFilter(TokenStream in)
Create a new
FilteringTokenFilter . |
GraphTokenFilter(TokenStream input)
Create a new GraphTokenFilter
|
LowerCaseFilter(TokenStream in)
Create a new LowerCaseFilter, that normalizes token text to lower case.
|
StopFilter(TokenStream in,
CharArraySet stopWords)
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
TokenFilter(TokenStream input)
Construct a token stream filtering the given input.
|
TokenStreamComponents(java.util.function.Consumer<java.io.Reader> source,
TokenStream result)
Creates a new
Analyzer.TokenStreamComponents instance. |
TokenStreamComponents(Tokenizer tokenizer,
TokenStream result)
Creates a new
Analyzer.TokenStreamComponents instance |
Modifier and Type | Class and Description |
---|---|
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ArabicNormalizationFilterFactory.create(TokenStream input) |
protected TokenStream |
ArabicAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
ArabicNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
ArabicStemFilter |
ArabicStemFilterFactory.create(TokenStream input) |
TokenStream |
ArabicNormalizationFilterFactory.create(TokenStream input) |
protected TokenStream |
ArabicAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
ArabicNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
ArabicNormalizationFilter(TokenStream input) |
ArabicStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BulgarianStemFilter
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
BulgarianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
BulgarianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
BulgarianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
BulgarianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
BulgarianStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BengaliNormalizationFilter
A
TokenFilter that applies BengaliNormalizer to normalize the
orthography. |
class |
BengaliStemFilter
A
TokenFilter that applies BengaliStemmer to stem Bengali words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
BengaliNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
BengaliStemFilterFactory.create(TokenStream input) |
protected TokenStream |
BengaliAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
BengaliNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
BengaliNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
BengaliStemFilterFactory.create(TokenStream input) |
protected TokenStream |
BengaliAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
BengaliNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
BengaliNormalizationFilter(TokenStream input) |
BengaliStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
BrazilianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
BrazilianStemFilter |
BrazilianStemFilterFactory.create(TokenStream in) |
protected TokenStream |
BrazilianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
BrazilianStemFilter(TokenStream in)
Creates a new BrazilianStemFilter
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
CatalanAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
CatalanAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKWidthFilter
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
CJKBigramFilterFactory.create(TokenStream input) |
TokenStream |
CJKWidthFilterFactory.create(TokenStream input) |
protected TokenStream |
CJKAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
CJKWidthFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CJKBigramFilterFactory.create(TokenStream input) |
TokenStream |
CJKWidthFilterFactory.create(TokenStream input) |
protected TokenStream |
CJKAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
CJKWidthFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
CJKBigramFilter(TokenStream in)
|
CJKBigramFilter(TokenStream in,
int flags)
|
CJKBigramFilter(TokenStream in,
int flags,
boolean outputUnigrams)
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed,
and whether or not unigrams should also be output.
|
CJKWidthFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
SoraniNormalizationFilter
A
TokenFilter that applies SoraniNormalizer to normalize the
orthography. |
class |
SoraniStemFilter
A
TokenFilter that applies SoraniStemmer to stem Sorani words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SoraniNormalizationFilterFactory.create(TokenStream input) |
protected TokenStream |
SoraniAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
SoraniNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SoraniNormalizationFilterFactory.create(TokenStream input) |
SoraniStemFilter |
SoraniStemFilterFactory.create(TokenStream input) |
protected TokenStream |
SoraniAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
SoraniNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
SoraniNormalizationFilter(TokenStream input) |
SoraniStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text.
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
SmartChineseAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
SmartChineseAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing.
|
class |
CommonGramsQueryFilter
Wrap a CommonGramsFilter optimizing phrase queries by only returning single
words when they are not a member of a bigram.
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
CommonGramsQueryFilterFactory.create(TokenStream input)
Create a CommonGramsFilter and wrap it with a CommonGramsQueryFilter
|
TokenFilter |
CommonGramsFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
CommonGramsFilter(TokenStream input,
CharArraySet commonWords)
Construct a token stream filtering the given input using a Set of common
words to create bigrams.
|
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
DictionaryCompoundWordTokenFilterFactory.create(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenFilter |
HyphenationCompoundWordTokenFilterFactory.create(TokenStream input) |
TokenStream |
DictionaryCompoundWordTokenFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
CompoundWordTokenFilterBase(TokenStream input,
CharArraySet dictionary) |
CompoundWordTokenFilterBase(TokenStream input,
CharArraySet dictionary,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(TokenStream input,
CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
DictionaryCompoundWordTokenFilter(TokenStream input,
CharArraySet dictionary)
Creates a new
DictionaryCompoundWordTokenFilter |
DictionaryCompoundWordTokenFilter(TokenStream input,
CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Creates a new
DictionaryCompoundWordTokenFilter |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator)
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
CharArraySet dictionary)
Creates a new
HyphenationCompoundWordTokenFilter instance. |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
CharArraySet dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Creates a new
HyphenationCompoundWordTokenFilter instance. |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
int minWordSize,
int minSubwordSize,
int maxSubwordSize)
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
Modifier and Type | Class and Description |
---|---|
class |
DecimalDigitFilter
Folds all Unicode digits in
[:General_Category=Decimal_Number:]
to Basic Latin digits (0-9 ). |
class |
FlattenGraphFilter
Converts an incoming graph token stream, such as one from
SynonymGraphFilter , into a flat form so that
all nodes form a single linear chain with no side paths. |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
UnicodeWhitespaceTokenizer
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
class |
UpperCaseFilter
Normalizes token text to UPPER CASE.
|
class |
WhitespaceTokenizer
A tokenizer that divides text at whitespace characters as defined by
Character.isWhitespace(int) . |
Modifier and Type | Method and Description |
---|---|
TokenStream |
UpperCaseFilterFactory.create(TokenStream input) |
TokenStream |
FlattenGraphFilterFactory.create(TokenStream input) |
TokenStream |
LowerCaseFilterFactory.create(TokenStream input) |
TokenStream |
StopFilterFactory.create(TokenStream input) |
TokenStream |
DecimalDigitFilterFactory.create(TokenStream input) |
TokenStream |
TypeTokenFilterFactory.create(TokenStream input) |
protected TokenStream |
StopAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
SimpleAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
UpperCaseFilterFactory.normalize(TokenStream input) |
TokenStream |
LowerCaseFilterFactory.normalize(TokenStream input) |
TokenStream |
DecimalDigitFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
UpperCaseFilterFactory.create(TokenStream input) |
TokenStream |
FlattenGraphFilterFactory.create(TokenStream input) |
TokenStream |
LowerCaseFilterFactory.create(TokenStream input) |
TokenStream |
StopFilterFactory.create(TokenStream input) |
TokenStream |
DecimalDigitFilterFactory.create(TokenStream input) |
TokenStream |
TypeTokenFilterFactory.create(TokenStream input) |
protected TokenStream |
StopAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
SimpleAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
UpperCaseFilterFactory.normalize(TokenStream input) |
TokenStream |
LowerCaseFilterFactory.normalize(TokenStream input) |
TokenStream |
DecimalDigitFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
DecimalDigitFilter(TokenStream input)
Creates a new DecimalDigitFilter over
input |
FlattenGraphFilter(TokenStream in) |
LowerCaseFilter(TokenStream in)
Create a new LowerCaseFilter, that normalizes token text to lower case.
|
StopFilter(TokenStream in,
CharArraySet stopWords)
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
TypeTokenFilter(TokenStream input,
java.util.Set<java.lang.String> stopTypes)
Create a new
TypeTokenFilter that filters tokens out
(useWhiteList=false). |
TypeTokenFilter(TokenStream input,
java.util.Set<java.lang.String> stopTypes,
boolean useWhiteList)
Create a new
TypeTokenFilter . |
UpperCaseFilter(TokenStream in)
Create a new UpperCaseFilter, that normalizes token text to upper case.
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
CustomAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
CustomAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
CzechStemFilter
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CzechStemFilterFactory.create(TokenStream input) |
protected TokenStream |
CzechAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CzechStemFilterFactory.create(TokenStream input) |
protected TokenStream |
CzechAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
CzechStemFilter(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
DanishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
DanishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
GermanLightStemFilter
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GermanMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
GermanLightStemFilterFactory.create(TokenStream input) |
TokenStream |
GermanNormalizationFilterFactory.create(TokenStream input) |
protected TokenStream |
GermanAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
GermanNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GermanMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
GermanLightStemFilterFactory.create(TokenStream input) |
TokenStream |
GermanNormalizationFilterFactory.create(TokenStream input) |
GermanStemFilter |
GermanStemFilterFactory.create(TokenStream in) |
protected TokenStream |
GermanAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
GermanNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
GermanLightStemFilter(TokenStream input) |
GermanMinimalStemFilter(TokenStream input) |
GermanNormalizationFilter(TokenStream input) |
GermanStemFilter(TokenStream in)
Creates a
GermanStemFilter instance |
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GreekLowerCaseFilterFactory.create(TokenStream in) |
TokenStream |
GreekStemFilterFactory.create(TokenStream input) |
protected TokenStream |
GreekAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
GreekLowerCaseFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GreekLowerCaseFilterFactory.create(TokenStream in) |
TokenStream |
GreekStemFilterFactory.create(TokenStream input) |
protected TokenStream |
GreekAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
GreekLowerCaseFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
GreekLowerCaseFilter(TokenStream in)
Create a GreekLowerCaseFilter that normalizes Greek token text.
|
GreekStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
EnglishMinimalStemFilter
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter
A high-performance kstem filter for english.
|
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
EnglishMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
EnglishPossessiveFilterFactory.create(TokenStream input) |
protected TokenStream |
EnglishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
PorterStemFilter |
PorterStemFilterFactory.create(TokenStream input) |
TokenFilter |
KStemFilterFactory.create(TokenStream input) |
TokenStream |
EnglishMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
EnglishPossessiveFilterFactory.create(TokenStream input) |
protected TokenStream |
EnglishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
EnglishMinimalStemFilter(TokenStream input) |
EnglishPossessiveFilter(TokenStream input) |
KStemFilter(TokenStream in) |
PorterStemFilter(TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
SpanishLightStemFilter
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SpanishLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
SpanishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SpanishLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
SpanishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
SpanishLightStemFilter(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
BasqueAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
BasqueAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PersianNormalizationFilterFactory.create(TokenStream input) |
protected TokenStream |
PersianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
PersianNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PersianNormalizationFilterFactory.create(TokenStream input) |
protected TokenStream |
PersianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
PersianNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
PersianNormalizationFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
FinnishLightStemFilter
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
FinnishLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
FinnishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
FinnishLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
FinnishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
FinnishLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
FrenchLightStemFilter
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
FrenchLightStemFilterFactory.create(TokenStream input) |
TokenStream |
FrenchMinimalStemFilterFactory.create(TokenStream input) |
protected TokenStream |
FrenchAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
FrenchLightStemFilterFactory.create(TokenStream input) |
TokenStream |
FrenchMinimalStemFilterFactory.create(TokenStream input) |
protected TokenStream |
FrenchAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
FrenchLightStemFilter(TokenStream input) |
FrenchMinimalStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
IrishLowerCaseFilterFactory.create(TokenStream input) |
protected TokenStream |
IrishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
IrishLowerCaseFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
IrishLowerCaseFilterFactory.create(TokenStream input) |
protected TokenStream |
IrishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
IrishLowerCaseFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
IrishLowerCaseFilter(TokenStream in)
Create an IrishLowerCaseFilter that normalises Irish token text.
|
Modifier and Type | Class and Description |
---|---|
class |
GalicianMinimalStemFilter
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GalicianMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
GalicianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
GalicianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
GalicianMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
GalicianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
GalicianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
GalicianMinimalStemFilter(TokenStream input) |
GalicianStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HindiNormalizationFilter
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
HindiNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
HindiStemFilterFactory.create(TokenStream input) |
protected TokenStream |
HindiAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
HindiNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
HindiNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
HindiStemFilterFactory.create(TokenStream input) |
protected TokenStream |
HindiAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
HindiNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
HindiNormalizationFilter(TokenStream input) |
HindiStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HungarianLightStemFilter
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
HungarianLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
HungarianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
HungarianLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
HungarianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
HungarianLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
HunspellStemFilterFactory.create(TokenStream tokenStream) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
HunspellStemFilterFactory.create(TokenStream tokenStream) |
Constructor and Description |
---|
HunspellStemFilter(TokenStream input,
Dictionary dictionary)
Create a
HunspellStemFilter outputting all possible stems. |
HunspellStemFilter(TokenStream input,
Dictionary dictionary,
boolean dedup)
Create a
HunspellStemFilter outputting all possible stems. |
HunspellStemFilter(TokenStream input,
Dictionary dictionary,
boolean dedup,
boolean longestOnly)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided
Dictionary
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ArmenianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ArmenianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
IndonesianStemFilter
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
IndonesianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
IndonesianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
IndonesianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
IndonesianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
IndonesianStemFilter(TokenStream input)
|
IndonesianStemFilter(TokenStream input,
boolean stemDerivational)
Create a new IndonesianStemFilter.
|
Modifier and Type | Class and Description |
---|---|
class |
IndicNormalizationFilter
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
IndicNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
IndicNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
IndicNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
IndicNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
IndicNormalizationFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ItalianLightStemFilter
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ItalianLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
ItalianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ItalianLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
ItalianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
ItalianLightStemFilter(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
LithuanianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
LithuanianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
LatvianStemFilter
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
LatvianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
LatvianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
LatvianStemFilterFactory.create(TokenStream input) |
protected TokenStream |
LatvianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
LatvianStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
MinHashFilter
Generate min hash tokens from an incoming stream of tokens.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
MinHashFilterFactory.create(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
MinHashFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
MinHashFilter(TokenStream input,
int hashCount,
int bucketCount,
int hashSetSize,
boolean withRotation)
create a MinHash filter
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CapitalizationFilter
A filter to apply normal capitalization rules to Tokens.
|
class |
CodepointCountFilter
Removes words that are too long or too short from the stream.
|
class |
ConcatenateGraphFilter
Concatenates/Joins every incoming token with a separator into one output token for every path through the
token stream (which is a graph).
|
class |
ConcatenatingTokenStream
A TokenStream that takes an array of input TokenStreams as sources, and
concatenates them together.
|
class |
ConditionalTokenFilter
Allows skipping TokenFilters based on the current set of attributes.
|
private class |
ConditionalTokenFilter.OneTimeWrapper |
class |
DateRecognizerFilter
Filters all tokens that cannot be parsed to a date, using the provided
DateFormat . |
class |
DelimitedTermFrequencyTokenFilter
Characters before the delimiter are the "token", the textual integer after is the term frequency.
|
class |
EmptyTokenStream
An always exhausted token stream.
|
class |
FingerprintFilter
Filter outputs a single token which is a concatenation of the sorted and
de-duplicated set of input tokens.
|
class |
FixBrokenOffsetsFilter
Deprecated.
Fix the token filters that create broken offsets in the first place.
|
class |
HyphenatedWordsFilter
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines.
|
class |
KeepWordFilter
A TokenFilter that only keeps tokens with text contained in the
required words.
|
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordRepeatFilter
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with
KeywordAttribute.setKeyword(boolean) set to true and once set to false . |
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LimitTokenPositionFilter
This TokenFilter limits its emitted tokens to those with positions that
are not greater than the configured limit.
|
class |
PatternKeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
ProtectedTermFilter
A ConditionalTokenFilter that only applies its wrapped filters to tokens that
are not contained in a protected set.
|
class |
RemoveDuplicatesTokenFilter
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
|
class |
ScandinavianFoldingFilter
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
|
class |
ScandinavianNormalizationFilter
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ
and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
class |
SetKeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
StemmerOverrideFilter
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
class |
TrimFilter
Trims leading and trailing whitespace from Tokens in the stream.
|
class |
TruncateTokenFilter
A token filter for truncating the terms into a specific length.
|
class |
TypeAsSynonymFilter
Adds the
TypeAttribute.type() as a synonym,
i.e. |
class |
WordDelimiterFilter
Deprecated.
Use
WordDelimiterGraphFilter instead: it produces a correct
token graph so that e.g. PhraseQuery works correctly when it's used in
the search time analyzer. |
class |
WordDelimiterGraphFilter
Splits words into subwords and performs optional transformations on subword
groups, producing a correct token graph so that e.g.
|
Modifier and Type | Field and Description |
---|---|
private TokenStream |
ConditionalTokenFilter.delegate |
private TokenStream |
ConcatenateGraphFilter.inputTokenStream |
private TokenStream[] |
ConcatenatingTokenStream.sources |
Modifier and Type | Method and Description |
---|---|
TokenStream |
LimitTokenOffsetFilterFactory.create(TokenStream input) |
TokenStream |
KeywordMarkerFilterFactory.create(TokenStream input) |
TokenStream |
KeywordRepeatFilterFactory.create(TokenStream input) |
TokenStream |
LimitTokenCountFilterFactory.create(TokenStream input) |
TokenStream |
ASCIIFoldingFilterFactory.create(TokenStream input) |
TokenStream |
TruncateTokenFilterFactory.create(TokenStream input) |
TokenStream |
FingerprintFilterFactory.create(TokenStream input) |
TokenStream |
TypeAsSynonymFilterFactory.create(TokenStream input) |
TokenStream |
DateRecognizerFilterFactory.create(TokenStream input) |
TokenStream |
TrimFilterFactory.create(TokenStream input) |
TokenStream |
ConcatenateGraphFilterFactory.create(TokenStream input) |
TokenStream |
StemmerOverrideFilterFactory.create(TokenStream input) |
TokenStream |
ScandinavianFoldingFilterFactory.create(TokenStream input) |
TokenStream |
ConditionalTokenFilterFactory.create(TokenStream input) |
TokenStream |
FixBrokenOffsetsFilterFactory.create(TokenStream input) |
TokenStream |
LimitTokenPositionFilterFactory.create(TokenStream input) |
TokenStream |
KeepWordFilterFactory.create(TokenStream input) |
TokenStream |
ASCIIFoldingFilterFactory.normalize(TokenStream input) |
TokenStream |
TrimFilterFactory.normalize(TokenStream input) |
TokenStream |
ScandinavianNormalizationFilterFactory.normalize(TokenStream input) |
TokenStream |
ScandinavianFoldingFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
private static AttributeSource |
ConcatenatingTokenStream.combineSources(TokenStream... sources) |
DelimitedTermFrequencyTokenFilter |
DelimitedTermFrequencyTokenFilterFactory.create(TokenStream input) |
TokenStream |
LimitTokenOffsetFilterFactory.create(TokenStream input) |
TokenStream |
KeywordMarkerFilterFactory.create(TokenStream input) |
TokenStream |
KeywordRepeatFilterFactory.create(TokenStream input) |
TokenStream |
LimitTokenCountFilterFactory.create(TokenStream input) |
TokenStream |
ASCIIFoldingFilterFactory.create(TokenStream input) |
TokenStream |
TruncateTokenFilterFactory.create(TokenStream input) |
RemoveDuplicatesTokenFilter |
RemoveDuplicatesTokenFilterFactory.create(TokenStream input) |
TokenFilter |
WordDelimiterGraphFilterFactory.create(TokenStream input) |
TokenFilter |
WordDelimiterFilterFactory.create(TokenStream input)
Deprecated.
|
TokenStream |
FingerprintFilterFactory.create(TokenStream input) |
TokenStream |
TypeAsSynonymFilterFactory.create(TokenStream input) |
TokenStream |
DateRecognizerFilterFactory.create(TokenStream input) |
CapitalizationFilter |
CapitalizationFilterFactory.create(TokenStream input) |
TokenStream |
TrimFilterFactory.create(TokenStream input) |
TokenStream |
ConcatenateGraphFilterFactory.create(TokenStream input) |
ScandinavianNormalizationFilter |
ScandinavianNormalizationFilterFactory.create(TokenStream input) |
CodepointCountFilter |
CodepointCountFilterFactory.create(TokenStream input) |
TokenStream |
StemmerOverrideFilterFactory.create(TokenStream input) |
HyphenatedWordsFilter |
HyphenatedWordsFilterFactory.create(TokenStream input) |
TokenStream |
ScandinavianFoldingFilterFactory.create(TokenStream input) |
LengthFilter |
LengthFilterFactory.create(TokenStream input) |
TokenStream |
ConditionalTokenFilterFactory.create(TokenStream input) |
TokenStream |
FixBrokenOffsetsFilterFactory.create(TokenStream input) |
TokenStream |
LimitTokenPositionFilterFactory.create(TokenStream input) |
TokenStream |
KeepWordFilterFactory.create(TokenStream input) |
protected ConditionalTokenFilter |
ProtectedTermFilterFactory.create(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inner) |
protected abstract ConditionalTokenFilter |
ConditionalTokenFilterFactory.create(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inner)
Modify the incoming
TokenStream with a ConditionalTokenFilter |
TokenStream |
ASCIIFoldingFilterFactory.normalize(TokenStream input) |
TokenStream |
TrimFilterFactory.normalize(TokenStream input) |
TokenStream |
ScandinavianNormalizationFilterFactory.normalize(TokenStream input) |
TokenStream |
ScandinavianFoldingFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
protected ConditionalTokenFilter |
ProtectedTermFilterFactory.create(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inner) |
protected ConditionalTokenFilter |
ProtectedTermFilterFactory.create(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inner) |
protected abstract ConditionalTokenFilter |
ConditionalTokenFilterFactory.create(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inner)
Modify the incoming
TokenStream with a ConditionalTokenFilter |
protected abstract ConditionalTokenFilter |
ConditionalTokenFilterFactory.create(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inner)
Modify the incoming
TokenStream with a ConditionalTokenFilter |
Constructor and Description |
---|
ASCIIFoldingFilter(TokenStream input) |
ASCIIFoldingFilter(TokenStream input,
boolean preserveOriginal)
Create a new
ASCIIFoldingFilter . |
CapitalizationFilter(TokenStream in)
Creates a CapitalizationFilter with the default parameters.
|
CapitalizationFilter(TokenStream in,
boolean onlyFirstWord,
CharArraySet keep,
boolean forceFirstLetter,
java.util.Collection<char[]> okPrefix,
int minWordLength,
int maxWordCount,
int maxTokenLength)
Creates a CapitalizationFilter with the specified parameters.
|
CodepointCountFilter(TokenStream in,
int min,
int max)
Create a new
CodepointCountFilter . |
ConcatenateGraphFilter(TokenStream inputTokenStream)
Creates a token stream to convert
input to a token stream
of accepted strings by its token stream graph. |
ConcatenateGraphFilter(TokenStream inputTokenStream,
boolean preserveSep,
boolean preservePositionIncrements,
int maxGraphExpansions)
Creates a token stream to convert
input to a token stream
of accepted strings by its token stream graph. |
ConcatenatingTokenStream(TokenStream... sources)
Create a new ConcatenatingTokenStream from a set of inputs
|
ConditionalTokenFilter(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inputFactory)
Create a new ConditionalTokenFilter
|
DateRecognizerFilter(TokenStream input)
Uses
DateFormat.DEFAULT and Locale.ENGLISH to create a DateFormat instance. |
DateRecognizerFilter(TokenStream input,
java.text.DateFormat dateFormat) |
DelimitedTermFrequencyTokenFilter(TokenStream input) |
DelimitedTermFrequencyTokenFilter(TokenStream input,
char delimiter) |
FingerprintFilter(TokenStream input)
Create a new FingerprintFilter with default settings
|
FingerprintFilter(TokenStream input,
int maxOutputTokenSize,
char separator)
Create a new FingerprintFilter with control over all settings
|
FixBrokenOffsetsFilter(TokenStream in)
Deprecated.
|
HyphenatedWordsFilter(TokenStream in)
Creates a new HyphenatedWordsFilter
|
KeepWordFilter(TokenStream in,
CharArraySet words)
Create a new
KeepWordFilter . |
KeywordMarkerFilter(TokenStream in)
Creates a new
KeywordMarkerFilter |
KeywordRepeatFilter(TokenStream input)
Construct a token stream filtering the given input.
|
LengthFilter(TokenStream in,
int min,
int max)
Create a new
LengthFilter . |
LimitTokenCountFilter(TokenStream in,
int maxTokenCount)
Build a filter that only accepts tokens up to a maximum number.
|
LimitTokenCountFilter(TokenStream in,
int maxTokenCount,
boolean consumeAllTokens)
Build an filter that limits the maximum number of tokens per field.
|
LimitTokenOffsetFilter(TokenStream input,
int maxStartOffset)
Lets all tokens pass through until it sees one with a start offset <=
maxStartOffset
which won't pass and ends the stream. |
LimitTokenOffsetFilter(TokenStream input,
int maxStartOffset,
boolean consumeAllTokens) |
LimitTokenPositionFilter(TokenStream in,
int maxTokenPosition)
Build a filter that only accepts tokens up to and including the given maximum position.
|
LimitTokenPositionFilter(TokenStream in,
int maxTokenPosition,
boolean consumeAllTokens)
Build a filter that limits the maximum position of tokens to emit.
|
PatternKeywordMarkerFilter(TokenStream in,
java.util.regex.Pattern pattern)
Create a new
PatternKeywordMarkerFilter , that marks the current
token as a keyword if the tokens term buffer matches the provided
Pattern via the KeywordAttribute . |
ProtectedTermFilter(CharArraySet protectedTerms,
TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inputFactory)
Creates a new ProtectedTermFilter
|
RemoveDuplicatesTokenFilter(TokenStream in)
Creates a new RemoveDuplicatesTokenFilter
|
ScandinavianFoldingFilter(TokenStream input) |
ScandinavianNormalizationFilter(TokenStream input) |
SetKeywordMarkerFilter(TokenStream in,
CharArraySet keywordSet)
Create a new KeywordSetMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
StemmerOverrideFilter(TokenStream input,
StemmerOverrideFilter.StemmerOverrideMap stemmerOverrideMap)
Create a new StemmerOverrideFilter, performing dictionary-based stemming
with the provided
dictionary . |
TrimFilter(TokenStream in)
Create a new
TrimFilter . |
TruncateTokenFilter(TokenStream input,
int length) |
TypeAsSynonymFilter(TokenStream input) |
TypeAsSynonymFilter(TokenStream input,
java.lang.String prefix) |
WordDelimiterFilter(TokenStream in,
byte[] charTypeTable,
int configurationFlags,
CharArraySet protWords)
Deprecated.
Creates a new WordDelimiterFilter
|
WordDelimiterFilter(TokenStream in,
int configurationFlags,
CharArraySet protWords)
Deprecated.
Creates a new WordDelimiterFilter using
WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE
as its charTypeTable |
WordDelimiterGraphFilter(TokenStream in,
boolean adjustInternalOffsets,
byte[] charTypeTable,
int configurationFlags,
CharArraySet protWords)
Creates a new WordDelimiterGraphFilter
|
WordDelimiterGraphFilter(TokenStream in,
int configurationFlags,
CharArraySet protWords)
Creates a new WordDelimiterGraphFilter using
WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE
as its charTypeTable |
Constructor and Description |
---|
ConditionalTokenFilter(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inputFactory)
Create a new ConditionalTokenFilter
|
ConditionalTokenFilter(TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inputFactory)
Create a new ConditionalTokenFilter
|
ProtectedTermFilter(CharArraySet protectedTerms,
TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inputFactory)
Creates a new ProtectedTermFilter
|
ProtectedTermFilter(CharArraySet protectedTerms,
TokenStream input,
java.util.function.Function<TokenStream,TokenStream> inputFactory)
Creates a new ProtectedTermFilter
|
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
EdgeNGramFilterFactory.create(TokenStream input) |
TokenFilter |
NGramFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
EdgeNGramTokenFilter(TokenStream input,
int gramSize)
Creates an EdgeNGramTokenFilter that produces edge n-grams of the given
size.
|
EdgeNGramTokenFilter(TokenStream input,
int minGram,
int maxGram,
boolean preserveOriginal)
Creates an EdgeNGramTokenFilter that, for a given input term, produces all
edge n-grams with lengths >= minGram and <= maxGram.
|
NGramTokenFilter(TokenStream input,
int gramSize)
Creates an NGramTokenFilter that produces n-grams of the indicated size.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram,
boolean preserveOriginal)
Creates an NGramTokenFilter that, for a given input term, produces all
contained n-grams with lengths >= minGram and <= maxGram.
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
DutchAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
DutchAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
NorwegianLightStemFilter
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
NorwegianLightStemFilterFactory.create(TokenStream input) |
TokenStream |
NorwegianMinimalStemFilterFactory.create(TokenStream input) |
protected TokenStream |
NorwegianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
NorwegianLightStemFilterFactory.create(TokenStream input) |
TokenStream |
NorwegianMinimalStemFilterFactory.create(TokenStream input) |
protected TokenStream |
NorwegianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
NorwegianLightStemFilter(TokenStream input)
|
NorwegianLightStemFilter(TokenStream input,
int flags)
Creates a new NorwegianLightStemFilter
|
NorwegianMinimalStemFilter(TokenStream input)
|
NorwegianMinimalStemFilter(TokenStream input,
int flags)
Creates a new NorwegianLightStemFilter
|
Modifier and Type | Class and Description |
---|---|
class |
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.
|
class |
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies.
|
Modifier and Type | Class and Description |
---|---|
class |
PatternCaptureGroupTokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture
group in one or more patterns.
|
class |
PatternReplaceFilter
A TokenFilter which applies a Pattern to each token in the stream,
replacing match occurrences with the specified replacement string.
|
class |
PatternTokenizer
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream.
|
class |
SimplePatternSplitTokenizer
|
class |
SimplePatternTokenizer
|
Modifier and Type | Method and Description |
---|---|
PatternCaptureGroupTokenFilter |
PatternCaptureGroupFilterFactory.create(TokenStream input) |
PatternReplaceFilter |
PatternReplaceFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
PatternCaptureGroupTokenFilter(TokenStream input,
boolean preserveOriginal,
java.util.regex.Pattern... patterns) |
PatternReplaceFilter(TokenStream in,
java.util.regex.Pattern p,
java.lang.String replacement,
boolean all)
Constructs an instance to replace either the first, or all occurrences
|
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
TypeAttribute |
class |
TokenOffsetPayloadTokenFilter
Adds the
OffsetAttribute.startOffset()
and OffsetAttribute.endOffset()
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
TypeAttribute a payload. |
Modifier and Type | Method and Description |
---|---|
TokenOffsetPayloadTokenFilter |
TokenOffsetPayloadTokenFilterFactory.create(TokenStream input) |
NumericPayloadTokenFilter |
NumericPayloadTokenFilterFactory.create(TokenStream input) |
DelimitedPayloadTokenFilter |
DelimitedPayloadTokenFilterFactory.create(TokenStream input) |
TypeAsPayloadTokenFilter |
TypeAsPayloadTokenFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
DelimitedPayloadTokenFilter(TokenStream input,
char delimiter,
PayloadEncoder encoder) |
NumericPayloadTokenFilter(TokenStream input,
float payload,
java.lang.String typeMatch) |
TokenOffsetPayloadTokenFilter(TokenStream input) |
TypeAsPayloadTokenFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
PortugueseLightStemFilter
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PortugueseLightStemFilterFactory.create(TokenStream input) |
TokenStream |
PortugueseMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
PortugueseStemFilterFactory.create(TokenStream input) |
protected TokenStream |
PortugueseAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PortugueseLightStemFilterFactory.create(TokenStream input) |
TokenStream |
PortugueseMinimalStemFilterFactory.create(TokenStream input) |
TokenStream |
PortugueseStemFilterFactory.create(TokenStream input) |
protected TokenStream |
PortugueseAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
PortugueseLightStemFilter(TokenStream input) |
PortugueseMinimalStemFilter(TokenStream input) |
PortugueseStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Modifier and Type | Method and Description |
---|---|
ReverseStringFilter |
ReverseStringFilterFactory.create(TokenStream in) |
Constructor and Description |
---|
ReverseStringFilter(TokenStream in)
Create a new ReverseStringFilter that reverses all tokens in the
supplied
TokenStream . |
ReverseStringFilter(TokenStream in,
char marker)
Create a new ReverseStringFilter that reverses and marks all tokens in the
supplied
TokenStream . |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
RomanianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
RomanianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
RussianLightStemFilter
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
RussianLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
RussianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
RussianLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
RussianAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
RussianLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
FixedShingleFilter
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
FixedShingleFilterFactory.create(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
FixedShingleFilterFactory.create(TokenStream input) |
ShingleFilter |
ShingleFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
FixedShingleFilter(TokenStream input,
int shingleSize)
Creates a FixedShingleFilter over an input token stream
|
FixedShingleFilter(TokenStream input,
int shingleSize,
java.lang.String tokenSeparator,
java.lang.String fillerToken)
Creates a FixedShingleFilter over an input token stream
|
ShingleFilter(TokenStream input)
Construct a ShingleFilter with default shingle size: 2.
|
ShingleFilter(TokenStream input,
int maxShingleSize)
Constructs a ShingleFilter with the specified shingle size from the
TokenStream input |
ShingleFilter(TokenStream input,
int minShingleSize,
int maxShingleSize)
Constructs a ShingleFilter with the specified shingle size from the
TokenStream input |
ShingleFilter(TokenStream input,
java.lang.String tokenType)
Construct a ShingleFilter with the specified token type for shingle tokens
and the default shingle size: 2
|
Modifier and Type | Class and Description |
---|---|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
TeeSinkTokenFilter.newSinkTokenStream()
Returns a new
TeeSinkTokenFilter.SinkTokenStream that receives all tokens consumed by this stream. |
Constructor and Description |
---|
TeeSinkTokenFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Method and Description |
---|---|
TokenFilter |
SnowballPorterFilterFactory.create(TokenStream input) |
Constructor and Description |
---|
SnowballFilter(TokenStream input,
SnowballProgram stemmer) |
SnowballFilter(TokenStream in,
java.lang.String name)
Construct the named stemming filter.
|
Modifier and Type | Class and Description |
---|---|
class |
SerbianNormalizationFilter
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
|
class |
SerbianNormalizationRegularFilter
Normalizes Serbian Cyrillic to Latin.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
SerbianNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
SerbianNormalizationFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SerbianNormalizationFilterFactory.create(TokenStream input) |
TokenStream |
SerbianNormalizationFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
SerbianNormalizationFilter(TokenStream input) |
SerbianNormalizationRegularFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer
A grammar-based tokenizer constructed with JFlex
|
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ClassicAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
UAX29URLEmailAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
StandardAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenFilter |
ClassicFilterFactory.create(TokenStream input) |
protected TokenStream |
ClassicAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
UAX29URLEmailAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
protected TokenStream |
StandardAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
ClassicFilter(TokenStream in)
Construct filtering in.
|
Modifier and Type | Class and Description |
---|---|
class |
SwedishLightStemFilter
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SwedishLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
SwedishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SwedishLightStemFilterFactory.create(TokenStream input) |
protected TokenStream |
SwedishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Constructor and Description |
---|
SwedishLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
SynonymFilter
Deprecated.
Use
SynonymGraphFilter instead, but be sure to also
use FlattenGraphFilter at index time (not at search time) as well. |
class |
SynonymGraphFilter
Applies single- or multi-token synonyms from a
SynonymMap
to an incoming TokenStream , producing a fully correct graph
output. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SynonymGraphFilterFactory.create(TokenStream input) |
TokenStream |
SynonymFilterFactory.create(TokenStream input)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
SynonymGraphFilterFactory.create(TokenStream input) |
TokenStream |
SynonymFilterFactory.create(TokenStream input)
Deprecated.
|
Constructor and Description |
---|
SynonymFilter(TokenStream input,
SynonymMap synonyms,
boolean ignoreCase)
Deprecated.
|
SynonymGraphFilter(TokenStream input,
SynonymMap synonyms,
boolean ignoreCase)
Apply previously built synonyms to incoming tokens.
|
Modifier and Type | Class and Description |
---|---|
class |
ThaiTokenizer
Tokenizer that use
BreakIterator to tokenize Thai text. |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ThaiAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ThaiAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
ApostropheFilter
Strips all characters after an apostrophe (including the apostrophe itself).
|
class |
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
TurkishLowerCaseFilterFactory.create(TokenStream input) |
TokenStream |
ApostropheFilterFactory.create(TokenStream input) |
protected TokenStream |
TurkishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
TurkishLowerCaseFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
TurkishLowerCaseFilterFactory.create(TokenStream input) |
TokenStream |
ApostropheFilterFactory.create(TokenStream input) |
protected TokenStream |
TurkishAnalyzer.normalize(java.lang.String fieldName,
TokenStream in) |
TokenStream |
TurkishLowerCaseFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
ApostropheFilter(TokenStream in) |
TurkishLowerCaseFilter(TokenStream in)
Create a new TurkishLowerCaseFilter, that normalizes Turkish token text
to lower case.
|
Modifier and Type | Class and Description |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
SegmentingTokenizerBase
Breaks text into sentences with a
BreakIterator and
allows subclasses to decompose these sentences into words. |
Modifier and Type | Method and Description |
---|---|
abstract TokenStream |
TokenFilterFactory.create(TokenStream input)
Transform the specified input TokenStream
|
TokenStream |
ElisionFilterFactory.create(TokenStream input) |
TokenStream |
TokenFilterFactory.normalize(TokenStream input)
Normalize the specified input TokenStream
While the default implementation returns input unchanged,
filters that should be applied at normalization time can delegate to
create method. |
TokenStream |
ElisionFilterFactory.normalize(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
abstract TokenStream |
TokenFilterFactory.create(TokenStream input)
Transform the specified input TokenStream
|
TokenStream |
ElisionFilterFactory.create(TokenStream input) |
TokenStream |
TokenFilterFactory.normalize(TokenStream input)
Normalize the specified input TokenStream
While the default implementation returns input unchanged,
filters that should be applied at normalization time can delegate to
create method. |
TokenStream |
ElisionFilterFactory.normalize(TokenStream input) |
Constructor and Description |
---|
ElisionFilter(TokenStream input,
CharArraySet articles)
Constructs an elision filter with a Set of stop words
|
Modifier and Type | Class and Description |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
StoredFieldsWriter.MergeVisitor.tokenStream(Analyzer analyzer,
TokenStream reuse) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
StoredFieldsWriter.MergeVisitor.tokenStream(Analyzer analyzer,
TokenStream reuse) |
Modifier and Type | Class and Description |
---|---|
private static class |
FeatureField.FeatureTokenStream |
private static class |
Field.BinaryTokenStream |
private static class |
Field.StringTokenStream |
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
Field.tokenStream
Pre-analyzed tokenStream for indexed fields; this is
separate from fieldsData because you are allowed to
have both; eg maybe field has a String value but you
customize how it's tokenized
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
FeatureField.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
Field.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
LazyDocument.LazyField.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
Field.tokenStreamValue()
The TokenStream for this field to be used when indexing, or null.
|
Modifier and Type | Method and Description |
---|---|
void |
Field.setTokenStream(TokenStream tokenStream)
Expert: sets the token stream to be used for indexing and causes
isIndexed() and isTokenized() to return true.
|
TokenStream |
FeatureField.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
Field.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
LazyDocument.LazyField.tokenStream(Analyzer analyzer,
TokenStream reuse) |
Constructor and Description |
---|
Field(java.lang.String name,
TokenStream tokenStream,
IndexableFieldType type)
Create field with TokenStream value.
|
TextField(java.lang.String name,
TokenStream stream)
Creates a new un-stored TextField with TokenStream value.
|
Modifier and Type | Field and Description |
---|---|
(package private) TokenStream |
DefaultIndexingChain.PerField.tokenStream |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SortingStoredFieldsConsumer.CopyVisitor.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
IndexableField.tokenStream(Analyzer analyzer,
TokenStream reuse)
Creates the TokenStream used for indexing this field.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
SortingStoredFieldsConsumer.CopyVisitor.tokenStream(Analyzer analyzer,
TokenStream reuse) |
TokenStream |
IndexableField.tokenStream(Analyzer analyzer,
TokenStream reuse)
Creates the TokenStream used for indexing this field.
|
Modifier and Type | Method and Description |
---|---|
<T> TokenStream |
MemoryIndex.keywordTokenStream(java.util.Collection<T> keywords)
Convenience method; Creates and returns a token stream that generates a
token for each keyword in the given collection, "as is", without any
transforming text analysis.
|
Modifier and Type | Method and Description |
---|---|
void |
MemoryIndex.addField(java.lang.String fieldName,
TokenStream stream)
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
void |
MemoryIndex.addField(java.lang.String fieldName,
TokenStream stream,
int positionIncrementGap)
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
void |
MemoryIndex.addField(java.lang.String fieldName,
TokenStream tokenStream,
int positionIncrementGap,
int offsetGap)
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
private void |
MemoryIndex.storeTerms(MemoryIndex.Info info,
TokenStream tokenStream,
int positionIncrementGap,
int offsetGap) |
Modifier and Type | Method and Description |
---|---|
TermAutomatonQuery |
TokenStreamToTermAutomatonQuery.toQuery(java.lang.String field,
TokenStream in)
Pulls the graph (including
PositionLengthAttribute ) from the provided TokenStream , and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term. |
Modifier and Type | Class and Description |
---|---|
(package private) class |
LimitTokenOffsetFilter
This is a simplified version of org.apache.lucene.analysis.miscellaneous.LimitTokenOffsetFilter to prevent
a dependency on analyzers-common.jar.
|
class |
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|
class |
TokenStreamFromTermVector
TokenStream created from a term vector field.
|
Modifier and Type | Field and Description |
---|---|
private TokenStream |
WeightedSpanTermExtractor.tokenStream |
Modifier and Type | Method and Description |
---|---|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
java.lang.String field,
Analyzer analyzer)
Deprecated.
|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
java.lang.String field,
Document document,
Analyzer analyzer)
Deprecated.
|
static TokenStream |
TokenSources.getTermVectorTokenStreamOrNull(java.lang.String field,
Fields tvFields,
int maxStartOffset)
Get a token stream by un-inverting the term vector.
|
TokenStream |
WeightedSpanTermExtractor.getTokenStream()
Returns the tokenStream which may have been wrapped in a CachingTokenFilter.
|
static TokenStream |
TokenSources.getTokenStream(Document doc,
java.lang.String field,
Analyzer analyzer)
Deprecated.
|
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
java.lang.String field,
Analyzer analyzer)
Deprecated.
|
static TokenStream |
TokenSources.getTokenStream(java.lang.String field,
Fields tvFields,
java.lang.String text,
Analyzer analyzer,
int maxStartOffset)
Get a token stream from either un-inverting a term vector if possible, or by analyzing the text.
|
static TokenStream |
TokenSources.getTokenStream(java.lang.String field,
java.lang.String contents,
Analyzer analyzer)
Deprecated.
|
static TokenStream |
TokenSources.getTokenStream(Terms tpv)
Deprecated.
|
static TokenStream |
TokenSources.getTokenStream(Terms vector,
boolean tokenPositionsGuaranteedContiguous)
Deprecated.
|
static TokenStream |
TokenSources.getTokenStreamWithOffsets(IndexReader reader,
int docId,
java.lang.String field)
Deprecated.
|
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
TokenStream |
Scorer.init(TokenStream tokenStream)
Called to init the Scorer with a
TokenStream . |
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
private TokenStream |
QueryScorer.initExtractor(TokenStream tokenStream) |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
Highlighter.getBestFragment(TokenStream tokenStream,
java.lang.String text)
Highlights chosen terms in a text, extracting the most relevant section.
|
java.lang.String[] |
Highlighter.getBestFragments(TokenStream tokenStream,
java.lang.String text,
int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
java.lang.String |
Highlighter.getBestFragments(TokenStream tokenStream,
java.lang.String text,
int maxNumFragments,
java.lang.String separator)
Highlights terms in the text , extracting the most relevant sections
and concatenating the chosen fragments with a separator (typically "...").
|
TextFragment[] |
Highlighter.getBestTextFragments(TokenStream tokenStream,
java.lang.String text,
boolean mergeContiguousFragments,
int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document.
|
java.util.Map<java.lang.String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
float boost,
TokenStream tokenStream)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
java.util.Map<java.lang.String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
float boost,
TokenStream tokenStream,
java.lang.String fieldName)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
java.util.Map<java.lang.String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTermsWithScores(Query query,
float boost,
TokenStream tokenStream,
java.lang.String fieldName,
IndexReader reader)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
TokenStream |
Scorer.init(TokenStream tokenStream)
Called to init the Scorer with a
TokenStream . |
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
private TokenStream |
QueryScorer.initExtractor(TokenStream tokenStream) |
void |
NullFragmenter.start(java.lang.String s,
TokenStream tokenStream) |
void |
SimpleFragmenter.start(java.lang.String originalText,
TokenStream stream) |
void |
Fragmenter.start(java.lang.String originalText,
TokenStream tokenStream)
Initializes the Fragmenter.
|
void |
SimpleSpanFragmenter.start(java.lang.String originalText,
TokenStream tokenStream) |
Constructor and Description |
---|
LimitTokenOffsetFilter(TokenStream input,
int maxStartOffset) |
OffsetLimitTokenFilter(TokenStream input,
int offsetLimit) |
TokenGroup(TokenStream tokenStream) |
Modifier and Type | Class and Description |
---|---|
private static class |
AnalysisOffsetStrategy.MultiValueTokenStream
Wraps an
Analyzer and string text that represents multiple values delimited by a specified character. |
Modifier and Type | Field and Description |
---|---|
(package private) TokenStream |
TokenStreamOffsetStrategy.TokenStreamOffsetsEnum.stream |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
AnalysisOffsetStrategy.tokenStream(java.lang.String content) |
Modifier and Type | Method and Description |
---|---|
private static FilteringTokenFilter |
MemoryIndexOffsetStrategy.newKeepWordFilter(TokenStream tokenStream,
CharacterRunAutomaton charRunAutomaton) |
Constructor and Description |
---|
MultiValueTokenStream(TokenStream subTokenStream,
java.lang.String fieldName,
Analyzer indexAnalyzer,
java.lang.String content,
char splitChar,
int splitCharIdx) |
TokenStreamOffsetsEnum(TokenStream ts,
CharacterRunAutomaton[] matchers) |
Modifier and Type | Method and Description |
---|---|
protected Query |
QueryBuilder.analyzeBoolean(java.lang.String field,
TokenStream stream)
Creates simple boolean query from the cached tokenstream contents
|
protected Query |
QueryBuilder.analyzeGraphBoolean(java.lang.String field,
TokenStream source,
BooleanClause.Occur operator)
Creates a boolean query from a graph token stream.
|
protected Query |
QueryBuilder.analyzeGraphPhrase(TokenStream source,
java.lang.String field,
int phraseSlop)
Creates graph phrase query from the tokenstream contents
|
protected Query |
QueryBuilder.analyzeMultiBoolean(java.lang.String field,
TokenStream stream,
BooleanClause.Occur operator)
Creates complex boolean query from the cached tokenstream contents
|
protected Query |
QueryBuilder.analyzeMultiPhrase(java.lang.String field,
TokenStream stream,
int slop)
Creates complex phrase query from the cached tokenstream contents
|
protected Query |
QueryBuilder.analyzePhrase(java.lang.String field,
TokenStream stream,
int slop)
Creates simple phrase query from the cached tokenstream contents
|
protected Query |
QueryBuilder.analyzeTerm(java.lang.String field,
TokenStream stream)
Creates simple term query from the cached tokenstream contents
|
protected Query |
QueryBuilder.createFieldQuery(TokenStream source,
BooleanClause.Occur operator,
java.lang.String field,
boolean quoted,
int phraseSlop)
Creates a query from a token stream.
|
protected SpanQuery |
QueryBuilder.createSpanQuery(TokenStream in,
java.lang.String field)
Creates a span query from the tokenstream.
|
Modifier and Type | Class and Description |
---|---|
private class |
GraphTokenStreamFiniteStrings.FiniteStringsTokenStream |
Modifier and Type | Method and Description |
---|---|
java.util.Iterator<TokenStream> |
GraphTokenStreamFiniteStrings.getFiniteStrings()
Get all finite strings from the automaton.
|
java.util.Iterator<TokenStream> |
GraphTokenStreamFiniteStrings.getFiniteStrings(int startState,
int endState)
Get all finite strings that start at
startState and end at endState . |
Modifier and Type | Method and Description |
---|---|
private Automaton |
GraphTokenStreamFiniteStrings.build(TokenStream in)
Build an automaton from the provided
TokenStream . |
Constructor and Description |
---|
GraphTokenStreamFiniteStrings(TokenStream in) |