module ScopedSearch::QueryLanguage::Tokenizer
The Tokenizer module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.
Constants
- KEYWORDS
All keywords that the language supports
- OPERATORS
Every operator the language supports.
Public Instance Methods
Returns the current character of the string
# File lib/scoped_search/query_language/tokenizer.rb, line 19 def current_char @current_char end
Tokenizes the string by iterating over the characters.
# File lib/scoped_search/query_language/tokenizer.rb, line 37 def each_token(&block) while next_char case current_char when /^\s?$/; # ignore when '('; yield(:lparen) when ')'; yield(:rparen) when ','; yield(:comma) when /\&|\||=|<|>|\^|!|~|-/; tokenize_operator(&block) when '"'; tokenize_quoted_keyword(&block) else; tokenize_keyword(&block) end end end
Returns the next character of the string, and moves the position pointer one step forward
# File lib/scoped_search/query_language/tokenizer.rb, line 31 def next_char @current_char_pos += 1 @current_char = @str[@current_char_pos, 1] end
Returns a following character of the string (by default, the next character), without updating the position pointer.
# File lib/scoped_search/query_language/tokenizer.rb, line 25 def peek_char(amount = 1) @str[@current_char_pos + amount, 1] end
Tokenizes the string and returns the result as an array of tokens.
# File lib/scoped_search/query_language/tokenizer.rb, line 13 def tokenize @current_char_pos = -1 to_a end
Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).
# File lib/scoped_search/query_language/tokenizer.rb, line 63 def tokenize_keyword(&block) keyword = current_char keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword) end
Tokenizes an operator that occurs in the OPERATORS hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding. github.com/wvanbergen/scoped_search/issues/33 for details
# File lib/scoped_search/query_language/tokenizer.rb, line 55 def tokenize_operator(&block) operator = current_char operator << next_char.to_s if OPERATORS.has_key?(operator + peek_char.to_s) yield(OPERATORS[operator]) end
Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.
# File lib/scoped_search/query_language/tokenizer.rb, line 71 def tokenize_quoted_keyword(&block) keyword = "" until next_char.nil? || current_char == '"' keyword << (current_char == "\\" ? next_char : current_char) end yield(keyword) end