public final class DaciukMihovAutomatonBuilder
extends java.lang.Object
Automaton
that accepts a set of
strings. The algorithm requires sorted input data, but is very fast
(nearly linear with the input size).Modifier and Type | Class and Description |
---|---|
private static class |
DaciukMihovAutomatonBuilder.State
DFSA state with
char labels on transitions. |
Modifier and Type | Field and Description |
---|---|
private static java.util.Comparator<CharsRef> |
comparator
A comparator used for enforcing sorted UTF8 order, used in assertions only.
|
(package private) static int |
MAX_TERM_LENGTH
This builder rejects terms that are more than 1k chars long since it then
uses recursion based on the length of the string, which might cause stack
overflows.
|
private CharsRef |
previous
Previous sequence added to the automaton in
add(CharsRef) . |
private DaciukMihovAutomatonBuilder.State |
root
Root automaton state.
|
private java.util.HashMap<DaciukMihovAutomatonBuilder.State,DaciukMihovAutomatonBuilder.State> |
stateRegistry
A "registry" for state interning.
|
Modifier | Constructor and Description |
---|---|
private |
DaciukMihovAutomatonBuilder()
The default constructor is private.
|
Modifier and Type | Method and Description |
---|---|
void |
add(CharsRef current)
Add another character sequence to this automaton.
|
private void |
addSuffix(DaciukMihovAutomatonBuilder.State state,
java.lang.CharSequence current,
int fromIndex)
Add a suffix of
current starting at fromIndex
(inclusive) to state state . |
static Automaton |
build(java.util.Collection<BytesRef> input)
Build a minimal, deterministic automaton from a sorted list of
BytesRef representing
strings in UTF-8. |
DaciukMihovAutomatonBuilder.State |
complete()
Finalize the automaton and return the root state.
|
private static int |
convert(Automaton.Builder a,
DaciukMihovAutomatonBuilder.State s,
java.util.IdentityHashMap<DaciukMihovAutomatonBuilder.State,java.lang.Integer> visited)
Internal recursive traversal for conversion.
|
private void |
replaceOrRegister(DaciukMihovAutomatonBuilder.State state)
Replace last child of
state with an already registered state
or stateRegistry the last child state. |
private boolean |
setPrevious(CharsRef current)
Copy
current into an internal buffer. |
static final int MAX_TERM_LENGTH
private java.util.HashMap<DaciukMihovAutomatonBuilder.State,DaciukMihovAutomatonBuilder.State> stateRegistry
private DaciukMihovAutomatonBuilder.State root
private CharsRef previous
add(CharsRef)
.private static final java.util.Comparator<CharsRef> comparator
private DaciukMihovAutomatonBuilder()
public void add(CharsRef current)
public DaciukMihovAutomatonBuilder.State complete()
private static int convert(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, java.util.IdentityHashMap<DaciukMihovAutomatonBuilder.State,java.lang.Integer> visited)
public static Automaton build(java.util.Collection<BytesRef> input)
BytesRef
representing
strings in UTF-8. These strings must be binary-sorted.private boolean setPrevious(CharsRef current)
current
into an internal buffer.private void replaceOrRegister(DaciukMihovAutomatonBuilder.State state)
state
with an already registered state
or stateRegistry the last child state.private void addSuffix(DaciukMihovAutomatonBuilder.State state, java.lang.CharSequence current, int fromIndex)
current
starting at fromIndex
(inclusive) to state state
.