org.apache.regexp

Class RECompiler

Known Direct Subclasses:
REDebugCompiler

public class RECompiler
extends java.lang.Object

A regular expression compiler class. This class compiles a pattern string into a regular expression program interpretable by the RE evaluator class. The 'recompile' command line tool uses this compiler to pre-compile regular expressions for use with RE. For a description of the syntax accepted by RECompiler and what you can do with regular expressions, see the documentation for the RE matcher class.
Version:
$Id: RECompiler.java 232365 2005-08-12 19:47:04Z vgritsenko $
Authors:
Jonathan Locke
Michael McCallum
See Also:
RE, recompile

Nested Class Summary

(package private) class
RECompiler.RERange
Local, nested class for maintaining character ranges for character classes.

Field Summary

(package private) static int
ESC_BACKREF
(package private) static int
ESC_CLASS
(package private) static int
ESC_COMPLEX
(package private) static int
ESC_MASK
(package private) static int
NODE_NORMAL
(package private) static int
NODE_NULLABLE
(package private) static int
NODE_TOPLEVEL
(package private) int[]
bracketEnd
(package private) int[]
bracketMin
(package private) int[]
bracketOpt
(package private) int[]
bracketStart
(package private) static int
bracketUnbounded
(package private) int
brackets
(package private) static Hashtable
hashPOSIX
(package private) int
idx
(package private) char[]
instruction
(package private) int
len
(package private) int
lenInstruction
(package private) int
maxBrackets
(package private) int
parens
(package private) String
pattern

Constructor Summary

RECompiler()
Constructor.

Method Summary

(package private) void
allocBrackets()
Allocate storage for brackets only as needed
(package private) int
atom()
Absorb an atomic character string.
(package private) void
bracket()
Match bracket {m,n} expression put results in bracket member variables
(package private) int
branch(int[] flags)
Compile one branch of an or operator (implements concatenation)
(package private) int
characterClass()
Compile a character class
(package private) int
closure(int[] flags)
Compile a possibly closured terminal
REProgram
compile(String pattern)
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
(package private) void
emit(char c)
Emit a single character into the program stream.
(package private) void
ensure(int n)
Ensures that n more characters can fit in the program buffer.
(package private) int
escape()
Match an escape sequence.
(package private) int
expr(int[] flags)
Compile an expression with possible parens around it.
(package private) void
internalError()
Throws a new internal error exception
(package private) int
node(char opcode, int opdata)
Adds a new node
(package private) void
nodeInsert(char opcode, int opdata, int insertAt)
Inserts a node with a given opcode and opdata at insertAt.
(package private) void
reallocBrackets()
Enlarge storage for brackets only as needed.
(package private) void
setNextOfEnd(int node, int pointTo)
Appends a node to the end of a node chain
(package private) void
syntaxError(String s)
Throws a new syntax error exception
(package private) int
terminal(int[] flags)
Match a terminal node.

Field Details

ESC_BACKREF

(package private) static final int ESC_BACKREF
Field Value:
1048575

ESC_CLASS

(package private) static final int ESC_CLASS
Field Value:
1048573

ESC_COMPLEX

(package private) static final int ESC_COMPLEX
Field Value:
1048574

ESC_MASK

(package private) static final int ESC_MASK
Field Value:
1048560

NODE_NORMAL

(package private) static final int NODE_NORMAL
Field Value:
0

NODE_NULLABLE

(package private) static final int NODE_NULLABLE
Field Value:
1

NODE_TOPLEVEL

(package private) static final int NODE_TOPLEVEL
Field Value:
2

bracketEnd

(package private)  int[] bracketEnd

bracketMin

(package private)  int[] bracketMin

bracketOpt

(package private)  int[] bracketOpt

bracketStart

(package private)  int[] bracketStart

bracketUnbounded

(package private) static final int bracketUnbounded
Field Value:
-1

brackets

(package private)  int brackets

hashPOSIX

(package private) static Hashtable hashPOSIX

idx

(package private)  int idx

instruction

(package private)  char[] instruction

len

(package private)  int len

lenInstruction

(package private)  int lenInstruction

maxBrackets

(package private)  int maxBrackets

parens

(package private)  int parens

pattern

(package private)  String pattern

Constructor Details

RECompiler

public RECompiler()
Constructor. Creates (initially empty) storage for a regular expression program.

Method Details

allocBrackets

(package private)  void allocBrackets()
Allocate storage for brackets only as needed

atom

(package private)  int atom()
            throws RESyntaxException
Absorb an atomic character string. This method is a little tricky because it can un-include the last character of string if a closure operator follows. This is correct because *+? have higher precedence than concatentation (thus ABC* means AB(C*) and NOT (ABC)*).
Returns:
Index of new atom node
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

bracket

(package private)  void bracket()
            throws RESyntaxException
Match bracket {m,n} expression put results in bracket member variables
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

branch

(package private)  int branch(int[] flags)
            throws RESyntaxException
Compile one branch of an or operator (implements concatenation)
Parameters:
flags - Flags passed by reference
Returns:
Pointer to branch node
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

characterClass

(package private)  int characterClass()
            throws RESyntaxException
Compile a character class
Returns:
Index of class node
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

closure

(package private)  int closure(int[] flags)
            throws RESyntaxException
Compile a possibly closured terminal
Parameters:
flags - Flags passed by reference
Returns:
Index of closured node
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

compile

public REProgram compile(String pattern)
            throws RESyntaxException
Compiles a regular expression pattern into a program runnable by the pattern matcher class 'RE'.
Parameters:
pattern - Regular expression pattern to compile (see RECompiler class for details).
Returns:
A compiled regular expression program.
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.
See Also:
RECompiler, RE

emit

(package private)  void emit(char c)
Emit a single character into the program stream.
Parameters:
c - Character to add

ensure

(package private)  void ensure(int n)
Ensures that n more characters can fit in the program buffer. If n more can't fit, then the size is doubled until it can.
Parameters:
n - Number of additional characters to ensure will fit.

escape

(package private)  int escape()
            throws RESyntaxException
Match an escape sequence. Handles quoted chars and octal escapes as well as normal escape characters. Always advances the input stream by the right amount. This code "understands" the subtle difference between an octal escape and a backref. You can access the type of ESC_CLASS or ESC_COMPLEX or ESC_BACKREF by looking at pattern[idx - 1].
Returns:
ESC_* code or character if simple escape
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

expr

(package private)  int expr(int[] flags)
            throws RESyntaxException
Compile an expression with possible parens around it. Paren matching is done at this level so we can tie the branch tails together.
Parameters:
flags - Flag value passed by reference
Returns:
Node index of expression in instruction array
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

internalError

(package private)  void internalError()
            throws Error
Throws a new internal error exception

node

(package private)  int node(char opcode,
                            int opdata)
Adds a new node
Parameters:
opcode - Opcode for node
opdata - Opdata for node (only the low 16 bits are currently used)
Returns:
Index of new node in program

nodeInsert

(package private)  void nodeInsert(char opcode,
                                   int opdata,
                                   int insertAt)
Inserts a node with a given opcode and opdata at insertAt. The node relative next pointer is initialized to 0.
Parameters:
opcode - Opcode for new node
opdata - Opdata for new node (only the low 16 bits are currently used)
insertAt - Index at which to insert the new node in the program

reallocBrackets

(package private)  void reallocBrackets()
Enlarge storage for brackets only as needed.

setNextOfEnd

(package private)  void setNextOfEnd(int node,
                                     int pointTo)
Appends a node to the end of a node chain
Parameters:
node - Start of node chain to traverse
pointTo - Node to have the tail of the chain point to

syntaxError

(package private)  void syntaxError(String s)
            throws RESyntaxException
Throws a new syntax error exception
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

terminal

(package private)  int terminal(int[] flags)
            throws RESyntaxException
Match a terminal node.
Parameters:
flags - Flags
Returns:
Index of terminal node (closeable)
Throws:
RESyntaxException - Thrown if the regular expression has invalid syntax.

Copyright © 2001-2003 Apache Software Foundation. All Rights Reserved.