public final class CFSA extends FSA
FSA5
offering smaller automata size
at some (minor) performance penalty.
Note: Serialize to CFSA2
for new code.
The encoding of automaton body is as follows.
---- FSA header (standard) Byte Description +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | +------ '\' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 1 | | | | | | | | | +------ 'f' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 2 | | | | | | | | | +------ 's' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 3 | | | | | | | | | +------ 'a' +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 4 | | | | | | | | | +------ version (fixed 0xc5) +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 5 | | | | | | | | | +------ filler character +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 6 | | | | | | | | | +------ annot character +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 7 |C|C|C|C|G|G|G|G| +------ C - node data size (ctl), G - address size (gotoLength) +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 8-32 | | | | | | | | | +------ labels mapped for type (1) of arc encoding. : : : : : : : : : | +-+-+-+-+-+-+-+-+/ ---- Start of a node; only if automaton was compiled with NUMBERS option. Byte +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | \ LSB +-+-+-+-+-+-+-+-+ + 1 | | | | | | | | | | number of strings recognized +-+-+-+-+-+-+-+-+ +----- by the automaton starting : : : : : : : : : | from this node. +-+-+-+-+-+-+-+-+ + ctl-1 | | | | | | | | | / MSB +-+-+-+-+-+-+-+-+/ ---- A vector of node's arcs. Conditional format, depending on flags. 1) NEXT bit set, mapped arc label. +--------------- arc's label mapped in M bits if M's field value > 0 | +------------- node pointed to is next | | +----------- the last arc of the node _______| | | +--------- the arc is final / | | | | +-+-+-+-+-+-+-+-+\ 0 |M|M|M|M|M|1|L|F| +------ flags + (M) index of the mapped label. +-+-+-+-+-+-+-+-+/ 2) NEXT bit set, label separate. +--------------- arc's label stored separately (M's field is zero). | +------------- node pointed to is next | | +----------- the last arc of the node | | | +--------- the arc is final | | | | +-+-+-+-+-+-+-+-+\ 0 |0|0|0|0|0|1|L|F| +------ flags +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 1 | | | | | | | | | +------ label +-+-+-+-+-+-+-+-+/ 3) NEXT bit not set. Full arc. +------------- node pointed to is next | +----------- the last arc of the node | | +--------- the arc is final | | | +-+-+-+-+-+-+-+-+\ 0 |A|A|A|A|A|0|L|F| +------ flags + (A) address field, lower bits +-+-+-+-+-+-+-+-+/ +-+-+-+-+-+-+-+-+\ 1 | | | | | | | | | +------ label +-+-+-+-+-+-+-+-+/ : : : : : : : : : +-+-+-+-+-+-+-+-+\ gtl-1 |A|A|A|A|A|A|A|A| +------ address, continuation (MSB) +-+-+-+-+-+-+-+-+/
Modifier and Type | Field and Description |
---|---|
byte[] |
arcs
An array of bytes with the internal representation of the automaton.
|
static int |
BIT_FINAL_ARC
Bitmask indicating that an arc corresponds to the last character of a
sequence available when building the automaton.
|
static int |
BIT_LAST_ARC
Bitmask indicating that an arc is the last one of the node's list and the
following one belongs to another node.
|
static int |
BIT_TARGET_NEXT
Bitmask indicating that the target node of this arc follows it in the
compressed automaton structure (no goto field).
|
int |
gtl
Number of bytes each address takes in full, expanded form (goto length).
|
byte[] |
labelMapping
Label mapping for arcs of type (1) (see class documentation).
|
int |
nodeDataLength
The length of the node header structure (if the automaton was compiled with
NUMBERS option). |
static byte |
VERSION
Automaton header version value.
|
Modifier and Type | Method and Description |
---|---|
int |
getArc(int node,
byte label) |
byte |
getArcLabel(int arc) |
int |
getEndNode(int arc) |
int |
getFirstArc(int node) |
Set<FSAFlags> |
getFlags() |
int |
getNextArc(int arc) |
int |
getRightLanguageCount(int node) |
int |
getRootNode()
Returns the start node of this automaton.
|
boolean |
isArcFinal(int arc) |
boolean |
isArcLast(int arc)
Returns
true if this arc has NEXT bit set. |
boolean |
isArcTerminal(int arc) |
boolean |
isLabelCompressed(int arc) |
boolean |
isNextSet(int arc) |
getArcCount, getSequences, getSequences, iterator, read, read, readRemaining, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
public static final byte VERSION
public static final int BIT_FINAL_ARC
public static final int BIT_LAST_ARC
public static final int BIT_TARGET_NEXT
public byte[] arcs
public final int nodeDataLength
NUMBERS
option). Otherwise zero.public final int gtl
public final byte[] labelMapping
public int getRootNode()
0
if
the start node is also an end node.getRootNode
in class FSA
public final int getFirstArc(int node)
getFirstArc
in class FSA
node
- Identifier of the node.node
or 0 if the node has no outgoing arcs.public final int getNextArc(int arc)
getNextArc
in class FSA
arc
- The arc's identifier.arc
and
leaving node
. Zero is returned if no more arcs are
available for the node.public int getArc(int node, byte label)
public int getEndNode(int arc)
getEndNode
in class FSA
arc
- The arc's identifier.arc
.
Terminal arcs (those that point to a terminal state) have no end
node representation and throw a runtime exception.public byte getArcLabel(int arc)
getArcLabel
in class FSA
arc
- The arc's identifier.arc
.public int getRightLanguageCount(int node)
getRightLanguageCount
in class FSA
node
- Identifier of the node.FSAFlags.NUMBERS
. The size
of the right language of the state, in other words.public boolean isArcFinal(int arc)
isArcFinal
in class FSA
arc
- The arc's identifier.true
if the destination node at the end of
this arc
corresponds to an input sequence created when
building this automaton.public boolean isArcTerminal(int arc)
isArcTerminal
in class FSA
arc
- The arc's identifier.true
if this arc
does not have a
terminating node (@link FSA.getEndNode(int)
will throw an
exception). Implies FSA.isArcFinal(int)
.public boolean isArcLast(int arc)
true
if this arc has NEXT
bit set.arc
- The node's arc identifier.BIT_LAST_ARC
public boolean isNextSet(int arc)
arc
- The node's arc identifier.BIT_TARGET_NEXT
is set for this arc.BIT_TARGET_NEXT
public boolean isLabelCompressed(int arc)
arc
- The node's arc identifier.true
if the label is compressed inside flags byte.public Set<FSAFlags> getFlags()
For this automaton version, an additional FSAFlags.NUMBERS
flag
may be set to indicate the automaton contains extra fields for each node.
Copyright © 2016. All rights reserved.