module Escape

Escape module provides several escape functions.

Public Instance Methods

html_attr(str) click to toggle source

#html_attr encodes a string as a double-quoted HTML attribute using character references.

Escape.html_attr("abc") #=> "\"abc\""
Escape.html_attr("a&b") #=> "\"a&b\""
Escape.html_attr("ab&<>\"c") #=> "\"ab&amp;&lt;&gt;&quot;c\""
Escape.html_attr("a'c") #=> "\"a'c\""

It escapes 4 characters:

  • '&' to '&amp;'

  • '<' to '&lt;'

  • '>' to '&gt;'

  • '“' to '&quot;'

# File lib/escape.rb, line 244
def html_attr(str)
  '"' + str.gsub(/[&<>"]/) {|ch| HTML_ATTR_ESCAPE_HASH[ch] } + '"'
end
html_form(pairs, sep='&') click to toggle source

#html_form composes HTML form key-value pairs as a x-www-form-urlencoded encoded string.

#html_form takes an array of pair of strings or an hash from string to string.

Escape.html_form([["a","b"], ["c","d"]]) #=> "a=b&c=d"
Escape.html_form({"a"=>"b", "c"=>"d"}) #=> "a=b&c=d"

In the array form, it is possible to use same key more than once. (It is required for a HTML form which contains checkboxes and select element with multiple attribute.)

Escape.html_form([["k","1"], ["k","2"]]) #=> "k=1&k=2"

If the strings contains characters which must be escaped in x-www-form-urlencoded, they are escaped using %-encoding.

Escape.html_form([["k=","&;="]]) #=> "k%3D=%26%3B%3D"

The separator can be specified by the optional second argument.

Escape.html_form([["a","b"], ["c","d"]], ";") #=> "a=b;c=d"

See HTML 4.01 for details.

# File lib/escape.rb, line 164
def html_form(pairs, sep='&')
  r = ''
  first = true
  pairs.each {|k, v|
    # query-chars - pct-encoded - x-www-form-urlencoded-delimiters =
    #   unreserved / "!" / "$" / "'" / "(" / ")" / "*" / "," / ":" / "@" / "/" / "?"
    # query-char - pct-encoded = unreserved / sub-delims / ":" / "@" / "/" / "?"
    # query-char = pchar / "/" / "?" = unreserved / pct-encoded / sub-delims / ":" / "@" / "/" / "?"
    # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
    # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
    # x-www-form-urlencoded-delimiters = "&" / "+" / ";" / "="
    r << sep if !first
    first = false
    k.each_byte {|byte|
      ch = byte.chr
      if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch
        r << "%" << ch.unpack("H2")[0].upcase
      else
        r << ch
      end
    }
    r << '='
    v.each_byte {|byte|
      ch = byte.chr
      if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch
        r << "%" << ch.unpack("H2")[0].upcase
      else
        r << ch
      end
    }
  }
  r
end
html_text(str) click to toggle source

#html_text escapes a string appropriate for HTML text using character references.

It escapes 3 characters:

  • '&' to '&amp;'

  • '<' to '&lt;'

  • '>' to '&gt;'

Escape.html_text("abc") #=> "abc"
Escape.html_text("a & b < c > d") #=> "a &amp; b &lt; c &gt; d"

This function is not appropriate for escaping HTML element attribute because quotes are not escaped.

# File lib/escape.rb, line 218
def html_text(str)
  str.gsub(/[&<>]/) {|ch| HTML_TEXT_ESCAPE_HASH[ch] }
end
shell_command(command) click to toggle source

#shell_command composes a sequence of words to a single shell command line. All shell meta characters are quoted and the words are concatenated with interleaving space.

Escape.shell_command(["ls", "/"]) #=> "ls /"
Escape.shell_command(["echo", "*"]) #=> "echo '*'"

Note that system(*command) and system(Escape.shell_command(command)) is roughly same. There are two exception as follows.

  • The first is that the later may invokes /bin/sh.

  • The second is an interpretation of an array with only one element: the element is parsed by the shell with the former but it is recognized as single word with the later. For example, system(*[“echo foo”]) invokes echo command with an argument “foo”. But system(Escape.shell_command([“echo foo”])) invokes “echo foo” command without arguments (and it probably fails).

# File lib/escape.rb, line 52
def shell_command(command)
  command.map {|word| shell_single_word(word) }.join(' ')
end
shell_single_word(str) click to toggle source

#shell_single_word quotes shell meta characters.

The result string is always single shell word, even if the argument is “”. #shell_single_word(“”) returns “''”.

Escape.shell_single_word("") #=> "''"
Escape.shell_single_word("foo") #=> "foo"
Escape.shell_single_word("*") #=> "'*'"
# File lib/escape.rb, line 65
def shell_single_word(str)
  if str.empty?
    "''"
  elsif %r{\A[0-9A-Za-z+,./:=@_-]+\z} =~ str
    str
  else
    result = ''
    str.scan(/('+)|[^']+/) {
      if $1
        result << %q{\} * $1.length
      else
        result << "'#{$&}'"
      end
    }
    result
  end
end
uri_path(str) click to toggle source

#uri_path escapes URI path using percent-encoding. The given path should be a sequence of (non-escaped) segments separated by “/”. The segments cannot contains “/”.

Escape.uri_path("a/b/c") #=> "a/b/c"
Escape.uri_path("a?b/c?d/e?f") #=> "a%3Fb/c%3Fd/e%3Ff"

The path is the part after authority before query in URI, as follows.

scheme://authority/path#fragment

See RFC 3986 for details of URI.

Note that this function is not appropriate to convert OS path to URI.

# File lib/escape.rb, line 115
def uri_path(str)
  str.gsub(%r{[^/]+}n) { uri_segment($&) }
end
uri_segment(str) click to toggle source

#uri_segment escapes URI segment using percent-encoding.

Escape.uri_segment("a/b") #=> "a%2Fb"

The segment is “/”-splitted element after authority before query in URI, as follows.

scheme://authority/segment1/segment2/.../segmentN?query#fragment

See RFC 3986 for details of URI.

# File lib/escape.rb, line 92
def uri_segment(str)
  # pchar - pct-encoded = unreserved / sub-delims / ":" / "@"
  # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
  str.gsub(%r{[^A-Za-z0-9\-._~!$&'()*+,;=:@]}n) {
    '%' + $&.unpack("H2")[0].upcase
  }
end