How QuickREx evaluates the Regular Expressions

QuickREx offers the most widely used implementations of Regular Expressions in the Java-world:

Since the APIs of the variants are slightly different, a common abstraction is used to hold information about matches and groups. The following interface abstracts a regular expression evaluated against a specific text and leading to a number of matches and groups:

package de.babe.eclipse.plugins.quickREx.regexp;

/**
 * Abstracts matches in a text.
 * 
 * @author bastian.bergerhoff
 */
public interface MatchSet {

  /**
   * Returns true if and only if there is a next match
   * in this MatchSet. Acts like next() in
   * an enumeration in that it causes the whole instance state to be 
   * centered around the next match.
   * 
   * @return true if and only if there is a next match
   */
  public boolean nextMatch();

  /**
   * Returns the start-offset of the current match.
   * 
   * @return the start-offset of the current match
   */
  public int start();

  /**
   * Returns the end-offset of the current match.
   * 
   * @return the end-offset of the current match
   */
 public int end();

 /**
  * Returns the number of groups in the current match.
  * 0 is returned if there are no groups - the match itself
  * does not count as a group.
  * 
  * @return the number of groups in the current match
  */
  public int groupCount();

  /**
   * Returns the String-contents of the group with the passed
   * index. 
   *  
   * @param groupIndex the index of the group
   * @return the String-contents of the group
   */
  public String groupContents(int groupIndex);

  /**
   * Returns the start-offset of the group with the passed
   * index.
   * 
   * @param groupIndex the index of the group
   * @return the start-offset of the group
   */
  public int groupStart(int groupIndex);

  /**
   * Returns the end-offset of the group with the passed
   * index.
   * 
   * @param groupIndex the index of the group
   * @return the end-offset of the group
   */
  public int groupEnd(int groupIndex);

}
		

There is an implementation for the JDK-variant and an abstract base-implementation plus two concrete implementations for the ORO-variants. The last only differ in their constructor, where Awk- or Perl-Compilers are used as requested. As an example implementation, consider the JDK-variant:

package de.babe.eclipse.plugins.quickREx.regexp.jdk;

import java.util.Collection;
import java.util.Iterator;
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import de.babe.eclipse.plugins.quickREx.regexp.Flag;
import de.babe.eclipse.plugins.quickREx.regexp.MatchSet;

/**
 * MatchSet using JDK-regular expressions.
 * 
 * @author bastian.bergerhoff, andreas.studer
 */
public class JavaMatchSet implements MatchSet {

  private final Pattern pattern;
  private final Matcher matcher;

  private final static Collection flags = new Vector();
  
  static {
    flags.add(JavaFlag.JDK_CANON_EQ);
    flags.add(JavaFlag.JDK_CASE_INSENSITIVE);
    flags.add(JavaFlag.JDK_COMMENTS);
    flags.add(JavaFlag.JDK_DOTALL);
    flags.add(JavaFlag.JDK_MULTILINE);
    flags.add(JavaFlag.JDK_UNICODE_CASE);
    flags.add(JavaFlag.JDK_UNIX_LINES);
  }
  
  /**
   * Returns a Collection of all Compiler-Flags the JDK-implementation
   * knows about.
   * 
   * @return a Collection of all Compiler-Flags the JDK-implementation
   * knows about
   */
  public static Collection getAllFlags() {
    return flags;
  }
  
  /**
   * The constructor - uses JDK-regular expressions
   * to evaluate the passed regular expression against
   * the passed text.
   * 
   * @param regExp the regular expression
   * @param text the text to evaluate regExp against
   * @param flags a Collection of Flags to pass to the Compiler
   */
  public JavaMatchSet(String regExp, String text, Collection flags) {
    int iFlags = 0;
    for (Iterator iter = flags.iterator(); iter.hasNext();) {
      Flag element = (Flag)iter.next();
      iFlags = iFlags | element.getFlag();
    }
    pattern = Pattern.compile(regExp, iFlags);
    matcher = pattern.matcher(text);
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#nextMatch()
   */
  public boolean nextMatch() {
    return matcher.find();
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#start()
   */
  public int start() {
    return matcher.start();
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#end()
   */
  public int end() {
    return matcher.end();
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#groupCount()
   */
  public int groupCount() {
    return matcher.groupCount();
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#groupContents(int)
   */
  public String groupContents(int groupIndex) {
    return matcher.group(groupIndex);
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#groupStart(int)
   */
  public int groupStart(int groupIndex) {
    return matcher.start(groupIndex);
  }

  /* (non-Javadoc)
   * @see de.babe.eclipse.plugins.quickREx.regexp.MatchSet#groupEnd(int)
   */
  public int groupEnd(int groupIndex) {
    return matcher.end(groupIndex);
  }
}
		

The MatchSets are then used to loop over and work out matches and groups:

MatchSet matches = MatchSetFactory.createMatchSet(QuickRExPlugin.getDefault().getREFlavour(), p_RegExp, p_testText, flags);
matchData = new Vector();
while (matches.nextMatch()) {
	Match match = new Match(matches.start(), matches.end());
	for (int g = 0; g<matches.groupCount(); g++) {
		match.addGroup(new Group(g+1, matches.groupContents(g+1), matches.groupStart(g+1), matches.groupEnd(g+1)));
	}
	matchData.add(match);
}
		

where Match and Group are abstractions of matches and groups.