Package | Description |
---|---|
org.apache.lucene.analysis |
Text analysis.
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.core |
Basic, general-purpose analysis components.
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.path |
Analysis components for path-like strings such as filenames.
|
org.apache.lucene.analysis.pattern |
Set of components for pattern-based (regex) analysis.
|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29. |
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.util |
Utility functions for text analysis.
|
org.apache.lucene.analysis.wikipedia |
Tokenizer that is aware of Wikipedia syntax.
|
org.apache.lucene.collation |
Unicode collation support.
|
org.apache.lucene.util |
Some utility classes.
|
Modifier and Type | Field and Description |
---|---|
static AttributeFactory |
TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Default
AttributeFactory instance that should be used for TokenStreams. |
Modifier and Type | Method and Description |
---|---|
protected AttributeFactory |
Analyzer.attributeFactory(java.lang.String fieldName)
|
protected AttributeFactory |
AnalyzerWrapper.attributeFactory(java.lang.String fieldName) |
Constructor and Description |
---|
StringTokenStream(AttributeFactory attributeFactory,
java.lang.String value,
int length) |
Tokenizer(AttributeFactory factory)
Construct a tokenizer with no input, awaiting a call to
Tokenizer.setReader(java.io.Reader) to
provide input. |
TokenStream(AttributeFactory factory)
A TokenStream using the supplied AttributeFactory for creating new
Attribute instances. |
Modifier and Type | Method and Description |
---|---|
Tokenizer |
HMMChineseTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
HMMChineseTokenizer(AttributeFactory factory)
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
WhitespaceTokenizerFactory.create(AttributeFactory factory) |
KeywordTokenizer |
KeywordTokenizerFactory.create(AttributeFactory factory) |
LetterTokenizer |
LetterTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
KeywordTokenizer(AttributeFactory factory,
int bufferSize) |
LetterTokenizer(AttributeFactory factory)
Construct a new LetterTokenizer using a given
AttributeFactory . |
LetterTokenizer(AttributeFactory factory,
int maxTokenLen)
Construct a new LetterTokenizer using a given
AttributeFactory . |
UnicodeWhitespaceTokenizer(AttributeFactory factory)
Construct a new UnicodeWhitespaceTokenizer using a given
AttributeFactory . |
UnicodeWhitespaceTokenizer(AttributeFactory factory,
int maxTokenLen)
Construct a new UnicodeWhitespaceTokenizer using a given
AttributeFactory . |
WhitespaceTokenizer(AttributeFactory factory)
Construct a new WhitespaceTokenizer using a given
AttributeFactory . |
WhitespaceTokenizer(AttributeFactory factory,
int maxTokenLen)
Construct a new WhitespaceTokenizer using a given
AttributeFactory . |
Modifier and Type | Method and Description |
---|---|
Tokenizer |
NGramTokenizerFactory.create(AttributeFactory factory)
|
Tokenizer |
EdgeNGramTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
EdgeNGramTokenizer(AttributeFactory factory,
int minGram,
int maxGram)
Creates EdgeNGramTokenizer that can generate n-grams in the sizes of the given range
|
NGramTokenizer(AttributeFactory factory,
int minGram,
int maxGram)
Creates NGramTokenizer with given min and max n-grams.
|
NGramTokenizer(AttributeFactory factory,
int minGram,
int maxGram,
boolean edgesOnly) |
Modifier and Type | Method and Description |
---|---|
Tokenizer |
PathHierarchyTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
PathHierarchyTokenizer(AttributeFactory factory,
char delimiter,
char replacement,
int skip) |
PathHierarchyTokenizer(AttributeFactory factory,
int bufferSize,
char delimiter,
char replacement,
int skip) |
ReversePathHierarchyTokenizer(AttributeFactory factory,
char delimiter,
char replacement,
int skip) |
ReversePathHierarchyTokenizer(AttributeFactory factory,
int bufferSize,
char delimiter,
char replacement,
int skip) |
Modifier and Type | Method and Description |
---|---|
SimplePatternTokenizer |
SimplePatternTokenizerFactory.create(AttributeFactory factory) |
SimplePatternSplitTokenizer |
SimplePatternSplitTokenizerFactory.create(AttributeFactory factory) |
PatternTokenizer |
PatternTokenizerFactory.create(AttributeFactory factory)
Split the input using configured pattern
|
Constructor and Description |
---|
PatternTokenizer(AttributeFactory factory,
java.util.regex.Pattern pattern,
int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
|
SimplePatternSplitTokenizer(AttributeFactory factory,
Automaton dfa)
Runs a pre-built automaton.
|
SimplePatternSplitTokenizer(AttributeFactory factory,
java.lang.String regexp,
int maxDeterminizedStates)
See
RegExp for the accepted syntax. |
SimplePatternTokenizer(AttributeFactory factory,
Automaton dfa)
Runs a pre-built automaton.
|
SimplePatternTokenizer(AttributeFactory factory,
java.lang.String regexp,
int maxDeterminizedStates)
See
RegExp for the accepted syntax. |
Modifier and Type | Method and Description |
---|---|
ClassicTokenizer |
ClassicTokenizerFactory.create(AttributeFactory factory) |
StandardTokenizer |
StandardTokenizerFactory.create(AttributeFactory factory) |
UAX29URLEmailTokenizer |
UAX29URLEmailTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
ClassicTokenizer(AttributeFactory factory)
Creates a new ClassicTokenizer with a given
AttributeFactory |
StandardTokenizer(AttributeFactory factory)
Creates a new StandardTokenizer with a given
AttributeFactory |
UAX29URLEmailTokenizer(AttributeFactory factory)
Creates a new UAX29URLEmailTokenizer with a given
AttributeFactory |
Modifier and Type | Method and Description |
---|---|
Tokenizer |
ThaiTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
ThaiTokenizer(AttributeFactory factory)
Creates a new ThaiTokenizer, supplying the AttributeFactory
|
Modifier and Type | Method and Description |
---|---|
abstract Tokenizer |
TokenizerFactory.create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactory
|
static CharTokenizer |
CharTokenizer.fromSeparatorCharPredicate(AttributeFactory factory,
java.util.function.IntPredicate separatorCharPredicate)
Creates a new instance of CharTokenizer with the supplied attribute factory using a custom predicate, supplied as method reference or lambda expression.
|
static CharTokenizer |
CharTokenizer.fromTokenCharPredicate(AttributeFactory factory,
java.util.function.IntPredicate tokenCharPredicate)
Creates a new instance of CharTokenizer with the supplied attribute factory using a custom predicate, supplied as method reference or lambda expression.
|
Constructor and Description |
---|
CharTokenizer(AttributeFactory factory)
Creates a new
CharTokenizer instance |
CharTokenizer(AttributeFactory factory,
int maxTokenLen)
Creates a new
CharTokenizer instance |
SegmentingTokenizerBase(AttributeFactory factory,
java.text.BreakIterator iterator)
Construct a new SegmenterBase, also supplying the AttributeFactory
|
Modifier and Type | Method and Description |
---|---|
WikipediaTokenizer |
WikipediaTokenizerFactory.create(AttributeFactory factory) |
Constructor and Description |
---|
WikipediaTokenizer(AttributeFactory factory,
int tokenOutput,
java.util.Set<java.lang.String> untokenizedTypes)
Creates a new instance of the
WikipediaTokenizer . |
Modifier and Type | Class and Description |
---|---|
class |
CollationAttributeFactory
Converts each token into its
CollationKey , and then
encodes the bytes as an index term. |
Modifier and Type | Method and Description |
---|---|
protected AttributeFactory |
CollationKeyAnalyzer.attributeFactory(java.lang.String fieldName) |
Constructor and Description |
---|
CollationAttributeFactory(AttributeFactory delegate,
java.text.Collator collator)
Create a CollationAttributeFactory, using the supplied Attribute Factory
as the factory for all other attributes.
|
Modifier and Type | Class and Description |
---|---|
private static class |
AttributeFactory.DefaultAttributeFactory |
static class |
AttributeFactory.StaticImplementationAttributeFactory<A extends AttributeImpl>
Expert: AttributeFactory returning an instance of the given
clazz for the
attributes it implements. |
Modifier and Type | Field and Description |
---|---|
static AttributeFactory |
AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY
This is the default factory that creates
AttributeImpl s using the
class name of the supplied Attribute interface class by appending Impl to it. |
private AttributeFactory |
AttributeFactory.StaticImplementationAttributeFactory.delegate |
private AttributeFactory |
AttributeSource.factory |
Modifier and Type | Method and Description |
---|---|
AttributeFactory |
AttributeSource.getAttributeFactory()
returns the used AttributeFactory.
|
static <A extends AttributeImpl> |
AttributeFactory.getStaticImplementation(AttributeFactory delegate,
java.lang.Class<A> clazz)
Returns an AttributeFactory returning an instance of the given
clazz for the
attributes it implements. |
Modifier and Type | Method and Description |
---|---|
static <A extends AttributeImpl> |
AttributeFactory.getStaticImplementation(AttributeFactory delegate,
java.lang.Class<A> clazz)
Returns an AttributeFactory returning an instance of the given
clazz for the
attributes it implements. |
Constructor and Description |
---|
AttributeSource(AttributeFactory factory)
An AttributeSource using the supplied
AttributeFactory for creating new Attribute instances. |
StaticImplementationAttributeFactory(AttributeFactory delegate,
java.lang.Class<A> clazz)
Expert: Creates an AttributeFactory returning
clazz as instance for the
attributes it implements and for all other attributes calls the given delegate factory. |