module ScopedSearch::QueryLanguage::Tokenizer

The Tokenizer module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.

Constants

KEYWORDS

All keywords that the language supports

OPERATORS

Every operator the language supports.

Public Instance Methods

current_char() click to toggle source

Returns the current character of the string

# File lib/scoped_search/query_language/tokenizer.rb, line 19
def current_char
  @current_char
end
each(&block)
Alias for: each_token
each_token() { |:lparen| ... } click to toggle source

Tokenizes the string by iterating over the characters.

# File lib/scoped_search/query_language/tokenizer.rb, line 37
def each_token(&block)
  while next_char
    case current_char
    when /^\s?$/; # ignore
    when '(';  yield(:lparen)
    when ')';  yield(:rparen)
    when ',';  yield(:comma)
    when /\&|\||=|<|>|\^|!|~|-/;  tokenize_operator(&block)
    when '"';                  tokenize_quoted_keyword(&block)
    else;                      tokenize_keyword(&block)
    end
  end
end
Also aliased as: each
next_char() click to toggle source

Returns the next character of the string, and moves the position pointer one step forward

# File lib/scoped_search/query_language/tokenizer.rb, line 31
def next_char
  @current_char_pos += 1
  @current_char = @str[@current_char_pos, 1]
end
peek_char(amount = 1) click to toggle source

Returns a following character of the string (by default, the next character), without updating the position pointer.

# File lib/scoped_search/query_language/tokenizer.rb, line 25
def peek_char(amount = 1)
  @str[@current_char_pos + amount, 1]
end
tokenize() click to toggle source

Tokenizes the string and returns the result as an array of tokens.

# File lib/scoped_search/query_language/tokenizer.rb, line 13
def tokenize
  @current_char_pos = -1
  to_a
end
tokenize_keyword() { |KEYWORDS| ... } click to toggle source

Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).

# File lib/scoped_search/query_language/tokenizer.rb, line 63
def tokenize_keyword(&block)
  keyword = current_char
  keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char
  KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword)
end
tokenize_operator() { |OPERATORS| ... } click to toggle source

Tokenizes an operator that occurs in the OPERATORS hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding. github.com/wvanbergen/scoped_search/issues/33 for details

# File lib/scoped_search/query_language/tokenizer.rb, line 55
def tokenize_operator(&block)
  operator = current_char
  operator << next_char.to_s if OPERATORS.has_key?(operator + peek_char.to_s)
  yield(OPERATORS[operator])
end
tokenize_quoted_keyword() { |keyword| ... } click to toggle source

Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.

# File lib/scoped_search/query_language/tokenizer.rb, line 71
def tokenize_quoted_keyword(&block)
  keyword = ""
  until next_char.nil? || current_char == '"'
    keyword << (current_char == "\\" ? next_char : current_char)
  end
  yield(keyword)
end