Package | Description |
---|---|
org.apache.lucene.analysis |
Text analysis.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous Tokenstreams.
|
org.apache.lucene.analysis.pattern |
Set of components for pattern-based (regex) analysis.
|
org.apache.lucene.codecs.blocktree |
BlockTree terms dictionary.
|
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.search |
Code to search indices.
|
org.apache.lucene.search.uhighlight |
The UnifiedHighlighter -- a flexible highlighter that can get offsets from postings, term vectors, or analysis.
|
org.apache.lucene.util.automaton |
Finite-state automaton for regular expressions.
|
org.apache.lucene.util.graph |
Utility classes for working with token streams as graphs.
|
Modifier and Type | Method and Description |
---|---|
Automaton |
TokenStreamToAutomaton.toAutomaton(TokenStream in)
Pulls the graph (including
PositionLengthAttribute ) from the provided TokenStream , and creates the corresponding
automaton where arcs are bytes (or Unicode code points
if unicodeArcs = true) from each term. |
Modifier and Type | Method and Description |
---|---|
private static Automaton |
ConcatenateGraphFilter.replaceSep(Automaton a,
boolean preserveSep,
int sepLabel) |
Automaton |
ConcatenateGraphFilter.toAutomaton()
Converts the tokenStream to an automaton, treating the transition labels as utf-8.
|
Automaton |
ConcatenateGraphFilter.toAutomaton(boolean unicodeAware)
Converts the tokenStream to an automaton.
|
Modifier and Type | Method and Description |
---|---|
private static Automaton |
ConcatenateGraphFilter.replaceSep(Automaton a,
boolean preserveSep,
int sepLabel) |
Modifier and Type | Field and Description |
---|---|
private Automaton |
SimplePatternTokenizerFactory.dfa |
private Automaton |
SimplePatternSplitTokenizerFactory.dfa |
Constructor and Description |
---|
SimplePatternSplitTokenizer(AttributeFactory factory,
Automaton dfa)
Runs a pre-built automaton.
|
SimplePatternSplitTokenizer(Automaton dfa)
Runs a pre-built automaton.
|
SimplePatternTokenizer(AttributeFactory factory,
Automaton dfa)
Runs a pre-built automaton.
|
SimplePatternTokenizer(Automaton dfa)
Runs a pre-built automaton.
|
Modifier and Type | Field and Description |
---|---|
(package private) Automaton |
IntersectTermsEnum.automaton |
Constructor and Description |
---|
IntersectTermsEnum(FieldReader fr,
Automaton automaton,
RunAutomaton runAutomaton,
BytesRef commonSuffix,
BytesRef startTerm) |
Modifier and Type | Field and Description |
---|---|
private Automaton |
AutomatonTermsEnum.automaton |
Modifier and Type | Field and Description |
---|---|
(package private) Automaton |
TermAutomatonQuery.TermAutomatonWeight.automaton |
protected Automaton |
AutomatonQuery.automaton
the automaton to match index terms against
|
(package private) Automaton |
TermAutomatonQuery.det |
Modifier and Type | Method and Description |
---|---|
private static Automaton[] |
FuzzyTermsEnum.buildAutomata(int[] termText,
int prefixLength,
boolean transpositions,
int maxEdits) |
static Automaton |
FuzzyTermsEnum.buildAutomaton(java.lang.String text,
int prefixLength,
boolean transpositions,
int maxEdits)
Builds a binary Automaton to match a fuzzy term
|
Automaton |
AutomatonQuery.getAutomaton()
Returns the automaton used to create this query
|
Automaton |
FuzzyQuery.toAutomaton()
Expert: Constructs an equivalent Automaton accepting terms matched by this query
|
static Automaton |
PrefixQuery.toAutomaton(BytesRef prefix)
Build an automaton accepting all terms with the specified prefix.
|
static Automaton |
TermRangeQuery.toAutomaton(BytesRef lowerTerm,
BytesRef upperTerm,
boolean includeLower,
boolean includeUpper) |
static Automaton |
WildcardQuery.toAutomaton(Term wildcardquery)
Convert Lucene wildcard syntax into an automaton.
|
Constructor and Description |
---|
AutomatonQuery(Term term,
Automaton automaton)
Create a new AutomatonQuery from an
Automaton . |
AutomatonQuery(Term term,
Automaton automaton,
int maxDeterminizedStates)
Create a new AutomatonQuery from an
Automaton . |
AutomatonQuery(Term term,
Automaton automaton,
int maxDeterminizedStates,
boolean isBinary)
Create a new AutomatonQuery from an
Automaton . |
TermAutomatonWeight(Automaton automaton,
IndexSearcher searcher,
java.util.Map<java.lang.Integer,TermStates> termStates,
float boost) |
TermRunAutomaton(Automaton a,
int termCount) |
Modifier and Type | Method and Description |
---|---|
private static CharacterRunAutomaton |
MultiTermHighlighting.binaryToCharRunAutomaton(Automaton binaryAutomaton,
java.lang.String description) |
Modifier and Type | Field and Description |
---|---|
private Automaton |
FiniteStringsIterator.a
Automaton to create finite string from.
|
Automaton |
CompiledAutomaton.automaton
Two dimensional array of transitions, indexed by state
number for traversal.
|
private Automaton |
TooComplexToDeterminizeException.automaton |
(package private) Automaton |
RunAutomaton.automaton |
Modifier and Type | Method and Description |
---|---|
static Automaton |
DaciukMihovAutomatonBuilder.build(java.util.Collection<BytesRef> input)
Build a minimal, deterministic automaton from a sorted list of
BytesRef representing
strings in UTF-8. |
static Automaton |
Operations.complement(Automaton a,
int maxDeterminizedStates)
Returns a (deterministic) automaton that accepts the complement of the
language of the given automaton.
|
static Automaton |
Operations.concatenate(Automaton a1,
Automaton a2)
Returns an automaton that accepts the concatenation of the languages of the
given automata.
|
static Automaton |
Operations.concatenate(java.util.List<Automaton> l)
Returns an automaton that accepts the concatenation of the languages of the
given automata.
|
Automaton |
UTF32ToUTF8.convert(Automaton utf32)
Converts an incoming utf32 automaton to an equivalent
utf8 one.
|
static Automaton |
Operations.determinize(Automaton a,
int maxDeterminizedStates)
Determinizes the given automaton.
|
Automaton |
Automaton.Builder.finish()
Compiles all added states and transitions into a new
Automaton
and returns it. |
Automaton |
TooComplexToDeterminizeException.getAutomaton()
Returns the automaton that caused this exception, if any.
|
Automaton |
AutomatonProvider.getAutomaton(java.lang.String name)
Returns automaton of the given name.
|
static Automaton |
Operations.intersection(Automaton a1,
Automaton a2)
Returns an automaton that accepts the intersection of the languages of the
given automata.
|
static Automaton |
Automata.makeAnyBinary()
Returns a new (deterministic) automaton that accepts all binary terms.
|
static Automaton |
Automata.makeAnyChar()
Returns a new (deterministic) automaton that accepts any single codepoint.
|
static Automaton |
Automata.makeAnyString()
Returns a new (deterministic) automaton that accepts all strings.
|
static Automaton |
Automata.makeBinary(BytesRef term)
Returns a new (deterministic) automaton that accepts the single given
binary term.
|
static Automaton |
Automata.makeBinaryInterval(BytesRef min,
boolean minInclusive,
BytesRef max,
boolean maxInclusive)
Creates a new deterministic, minimal automaton accepting
all binary terms in the specified interval.
|
static Automaton |
Automata.makeChar(int c)
Returns a new (deterministic) automaton that accepts a single codepoint of
the given value.
|
static Automaton |
Automata.makeCharRange(int min,
int max)
Returns a new (deterministic) automaton that accepts a single codepoint whose
value is in the given interval (including both end points).
|
static Automaton |
Automata.makeDecimalInterval(int min,
int max,
int digits)
Returns a new automaton that accepts strings representing decimal (base 10)
non-negative integers in the given interval.
|
static Automaton |
Automata.makeEmpty()
Returns a new (deterministic) automaton with the empty language.
|
static Automaton |
Automata.makeEmptyString()
Returns a new (deterministic) automaton that accepts only the empty string.
|
static Automaton |
Automata.makeString(int[] word,
int offset,
int length)
Returns a new (deterministic) automaton that accepts the single given
string from the specified unicode code points.
|
static Automaton |
Automata.makeString(java.lang.String s)
Returns a new (deterministic) automaton that accepts the single given
string.
|
static Automaton |
Automata.makeStringUnion(java.util.Collection<BytesRef> utf8Strings)
Returns a new (deterministic and minimal) automaton that accepts the union
of the given collection of
BytesRef s representing UTF-8 encoded
strings. |
static Automaton |
MinimizationOperations.minimize(Automaton a,
int maxDeterminizedStates)
Minimizes (and determinizes if not already deterministic) the given
automaton using Hopcroft's algorithm.
|
static Automaton |
Operations.minus(Automaton a1,
Automaton a2,
int maxDeterminizedStates)
Returns a (deterministic) automaton that accepts the intersection of the
language of
a1 and the complement of the language of
a2 . |
static Automaton |
Operations.optional(Automaton a)
Returns an automaton that accepts the union of the empty string and the
language of the given automaton.
|
static Automaton |
Operations.removeDeadStates(Automaton a)
Removes transitions to dead states (a state is "dead" if it is not
reachable from the initial state or no accept state is reachable from it.)
|
static Automaton |
Operations.repeat(Automaton a)
Returns an automaton that accepts the Kleene star (zero or more
concatenated repetitions) of the language of the given automaton.
|
static Automaton |
Operations.repeat(Automaton a,
int count)
Returns an automaton that accepts
min or more concatenated
repetitions of the language of the given automaton. |
static Automaton |
Operations.repeat(Automaton a,
int min,
int max)
Returns an automaton that accepts between
min and
max (including both) concatenated repetitions of the language
of the given automaton. |
static Automaton |
Operations.reverse(Automaton a)
Returns an automaton accepting the reverse language.
|
(package private) static Automaton |
Operations.reverse(Automaton a,
java.util.Set<java.lang.Integer> initialStates)
Reverses the automaton, returning the new initial states.
|
Automaton |
RegExp.toAutomaton()
Constructs new
Automaton from this RegExp . |
Automaton |
RegExp.toAutomaton(AutomatonProvider automaton_provider,
int maxDeterminizedStates)
Constructs new
Automaton from this RegExp . |
Automaton |
LevenshteinAutomata.toAutomaton(int n)
Compute a DFA that accepts all strings within an edit distance of
n . |
Automaton |
RegExp.toAutomaton(int maxDeterminizedStates)
Constructs new
Automaton from this RegExp . |
Automaton |
LevenshteinAutomata.toAutomaton(int n,
java.lang.String prefix)
Compute a DFA that accepts all strings within an edit distance of
n ,
matching the specified exact prefix. |
private Automaton |
RegExp.toAutomaton(java.util.Map<java.lang.String,Automaton> automata,
AutomatonProvider automaton_provider,
int maxDeterminizedStates) |
Automaton |
RegExp.toAutomaton(java.util.Map<java.lang.String,Automaton> automata,
int maxDeterminizedStates)
Constructs new
Automaton from this RegExp . |
private Automaton |
RegExp.toAutomatonInternal(java.util.Map<java.lang.String,Automaton> automata,
AutomatonProvider automaton_provider,
int maxDeterminizedStates) |
(package private) static Automaton |
Operations.totalize(Automaton a)
Returns a new automaton accepting the same language with added
transitions to a dead state so that from every state and every label
there is a transition.
|
static Automaton |
Operations.union(Automaton a1,
Automaton a2)
Returns an automaton that accepts the union of the languages of the given
automata.
|
static Automaton |
Operations.union(java.util.Collection<Automaton> l)
Returns an automaton that accepts the union of the languages of the given
automata.
|
Modifier and Type | Method and Description |
---|---|
static int |
Automata.appendAnyChar(Automaton a,
int state)
Accept any single character starting from the specified state, returning the new state
|
static int |
Automata.appendChar(Automaton a,
int state,
int c)
Appends the specified character to the specified state, returning a new state.
|
static Automaton |
Operations.complement(Automaton a,
int maxDeterminizedStates)
Returns a (deterministic) automaton that accepts the complement of the
language of the given automaton.
|
static Automaton |
Operations.concatenate(Automaton a1,
Automaton a2)
Returns an automaton that accepts the concatenation of the languages of the
given automata.
|
Automaton |
UTF32ToUTF8.convert(Automaton utf32)
Converts an incoming utf32 automaton to an equivalent
utf8 one.
|
void |
Automaton.copy(Automaton other)
Copies over all states/transitions from other.
|
void |
Automaton.Builder.copy(Automaton other)
Copies over all states/transitions from other.
|
void |
Automaton.Builder.copyStates(Automaton other)
Copies over all states from other.
|
static Automaton |
Operations.determinize(Automaton a,
int maxDeterminizedStates)
Determinizes the given automaton.
|
private static int |
CompiledAutomaton.findSinkState(Automaton automaton)
Returns sink state, if present, else -1.
|
static java.lang.String |
Operations.getCommonPrefix(Automaton a)
Returns the longest string that is a prefix of all accepted strings and
visits each state at most once.
|
static BytesRef |
Operations.getCommonPrefixBytesRef(Automaton a)
Returns the longest BytesRef that is a prefix of all accepted strings and
visits each state at most once.
|
static BytesRef |
Operations.getCommonSuffixBytesRef(Automaton a,
int maxDeterminizedStates)
Returns the longest BytesRef that is a suffix of all accepted strings.
|
private static java.util.BitSet |
Operations.getLiveStates(Automaton a)
Returns the set of live states.
|
private static java.util.BitSet |
Operations.getLiveStatesFromInitial(Automaton a)
Returns bitset marking states reachable from the initial state.
|
private static java.util.BitSet |
Operations.getLiveStatesToAccept(Automaton a)
Returns bitset marking states that can reach an accept state.
|
static IntsRef |
Operations.getSingleton(Automaton a)
If this automaton accepts a single input, return it.
|
static boolean |
Operations.hasDeadStates(Automaton a)
Returns true if this automaton has any states that cannot
be reached from the initial state or cannot reach an accept state.
|
static boolean |
Operations.hasDeadStatesFromInitial(Automaton a)
Returns true if there are dead states reachable from an initial state.
|
static boolean |
Operations.hasDeadStatesToAccept(Automaton a)
Returns true if there are dead states that reach an accept state.
|
static Automaton |
Operations.intersection(Automaton a1,
Automaton a2)
Returns an automaton that accepts the intersection of the languages of the
given automata.
|
static boolean |
Operations.isEmpty(Automaton a)
Returns true if the given automaton accepts no strings.
|
static boolean |
Operations.isFinite(Automaton a)
Returns true if the language of this automaton is finite.
|
private static boolean |
Operations.isFinite(Transition scratch,
Automaton a,
int state,
java.util.BitSet path,
java.util.BitSet visited,
int level)
Checks whether there is a loop containing state.
|
static boolean |
Operations.isTotal(Automaton a)
Returns true if the given automaton accepts all strings.
|
static boolean |
Operations.isTotal(Automaton a,
int minAlphabet,
int maxAlphabet)
Returns true if the given automaton accepts all strings for the specified min/max
range of the alphabet.
|
static Automaton |
MinimizationOperations.minimize(Automaton a,
int maxDeterminizedStates)
Minimizes (and determinizes if not already deterministic) the given
automaton using Hopcroft's algorithm.
|
static Automaton |
Operations.minus(Automaton a1,
Automaton a2,
int maxDeterminizedStates)
Returns a (deterministic) automaton that accepts the intersection of the
language of
a1 and the complement of the language of
a2 . |
int |
FiniteStringsIterator.PathNode.nextLabel(Automaton a)
Returns next label of current transition, or
advances to next transition and returns its first
label, if current one is exhausted.
|
static Automaton |
Operations.optional(Automaton a)
Returns an automaton that accepts the union of the empty string and the
language of the given automaton.
|
static Automaton |
Operations.removeDeadStates(Automaton a)
Removes transitions to dead states (a state is "dead" if it is not
reachable from the initial state or no accept state is reachable from it.)
|
static Automaton |
Operations.repeat(Automaton a)
Returns an automaton that accepts the Kleene star (zero or more
concatenated repetitions) of the language of the given automaton.
|
static Automaton |
Operations.repeat(Automaton a,
int count)
Returns an automaton that accepts
min or more concatenated
repetitions of the language of the given automaton. |
static Automaton |
Operations.repeat(Automaton a,
int min,
int max)
Returns an automaton that accepts between
min and
max (including both) concatenated repetitions of the language
of the given automaton. |
void |
FiniteStringsIterator.PathNode.resetState(Automaton a,
int state) |
static Automaton |
Operations.reverse(Automaton a)
Returns an automaton accepting the reverse language.
|
(package private) static Automaton |
Operations.reverse(Automaton a,
java.util.Set<java.lang.Integer> initialStates)
Reverses the automaton, returning the new initial states.
|
static boolean |
Operations.run(Automaton a,
IntsRef s)
Returns true if the given string (expressed as unicode codepoints) is accepted by the automaton.
|
static boolean |
Operations.run(Automaton a,
java.lang.String s)
Returns true if the given string is accepted by the automaton.
|
static boolean |
Operations.sameLanguage(Automaton a1,
Automaton a2)
Returns true if these two automata accept exactly the
same language.
|
static boolean |
Operations.subsetOf(Automaton a1,
Automaton a2)
Returns true if the language of
a1 is a subset of the language
of a2 . |
static int[] |
Operations.topoSortStates(Automaton a)
Returns the topological sort of all states reachable from
the initial state.
|
private static int |
Operations.topoSortStatesRecurse(Automaton a,
java.util.BitSet visited,
int[] states,
int upto,
int state,
int level) |
private static java.util.Set<java.lang.Integer> |
Operations.toSet(Automaton a,
int offset) |
(package private) static Automaton |
Operations.totalize(Automaton a)
Returns a new automaton accepting the same language with added
transitions to a dead state so that from every state and every label
there is a transition.
|
static Automaton |
Operations.union(Automaton a1,
Automaton a2)
Returns an automaton that accepts the union of the languages of the given
automata.
|
Modifier and Type | Method and Description |
---|---|
static Automaton |
Operations.concatenate(java.util.List<Automaton> l)
Returns an automaton that accepts the concatenation of the languages of the
given automata.
|
private void |
RegExp.findLeaves(RegExp exp,
RegExp.Kind kind,
java.util.List<Automaton> list,
java.util.Map<java.lang.String,Automaton> automata,
AutomatonProvider automaton_provider,
int maxDeterminizedStates) |
private void |
RegExp.findLeaves(RegExp exp,
RegExp.Kind kind,
java.util.List<Automaton> list,
java.util.Map<java.lang.String,Automaton> automata,
AutomatonProvider automaton_provider,
int maxDeterminizedStates) |
private Automaton |
RegExp.toAutomaton(java.util.Map<java.lang.String,Automaton> automata,
AutomatonProvider automaton_provider,
int maxDeterminizedStates) |
Automaton |
RegExp.toAutomaton(java.util.Map<java.lang.String,Automaton> automata,
int maxDeterminizedStates)
Constructs new
Automaton from this RegExp . |
private Automaton |
RegExp.toAutomatonInternal(java.util.Map<java.lang.String,Automaton> automata,
AutomatonProvider automaton_provider,
int maxDeterminizedStates) |
static Automaton |
Operations.union(java.util.Collection<Automaton> l)
Returns an automaton that accepts the union of the languages of the given
automata.
|
Constructor and Description |
---|
ByteRunAutomaton(Automaton a)
Converts incoming automaton to byte-based (UTF32ToUTF8) first
|
ByteRunAutomaton(Automaton a,
boolean isBinary,
int maxDeterminizedStates)
expert: if isBinary is true, the input is already byte-based
|
CharacterRunAutomaton(Automaton a)
Construct with a default number of maxDeterminizedStates.
|
CharacterRunAutomaton(Automaton a,
int maxDeterminizedStates)
Construct specifying maxDeterminizedStates.
|
CompiledAutomaton(Automaton automaton)
Create this, passing simplify=true and finite=null, so that we try
to simplify the automaton and determine if it is finite.
|
CompiledAutomaton(Automaton automaton,
java.lang.Boolean finite,
boolean simplify)
Create this.
|
CompiledAutomaton(Automaton automaton,
java.lang.Boolean finite,
boolean simplify,
int maxDeterminizedStates,
boolean isBinary)
Create this.
|
FiniteStringsIterator(Automaton a)
Constructor.
|
FiniteStringsIterator(Automaton a,
int startState,
int endState)
Constructor.
|
LimitedFiniteStringsIterator(Automaton a,
int limit)
Constructor.
|
RunAutomaton(Automaton a,
int alphabetSize)
Constructs a new
RunAutomaton from a deterministic
Automaton . |
RunAutomaton(Automaton a,
int alphabetSize,
int maxDeterminizedStates)
Constructs a new
RunAutomaton from a deterministic
Automaton . |
TooComplexToDeterminizeException(Automaton automaton,
int maxDeterminizedStates)
Use this constructor when the automaton failed to determinize.
|
Modifier and Type | Field and Description |
---|---|
private Automaton |
GraphTokenStreamFiniteStrings.det |
Modifier and Type | Method and Description |
---|---|
private Automaton |
GraphTokenStreamFiniteStrings.build(TokenStream in)
Build an automaton from the provided
TokenStream . |
Modifier and Type | Method and Description |
---|---|
private static void |
GraphTokenStreamFiniteStrings.articulationPointsRecurse(Automaton a,
int state,
int d,
int[] depth,
int[] low,
int[] parent,
java.util.BitSet visited,
java.util.List<java.lang.Integer> points) |