module Escape
Escape module provides several escape functions.
-
URI
-
HTML
-
shell command
Public Instance Methods
#html_attr encodes a string as a double-quoted HTML attribute using character references.
Escape.html_attr("abc") #=> "\"abc\"" Escape.html_attr("a&b") #=> "\"a&b\"" Escape.html_attr("ab&<>\"c") #=> "\"ab&<>"c\"" Escape.html_attr("a'c") #=> "\"a'c\""
It escapes 4 characters:
-
'&' to '&'
-
'<' to '<'
-
'>' to '>'
-
'“' to '"'
# File lib/escape.rb, line 244 def html_attr(str) '"' + str.gsub(/[&<>"]/) {|ch| HTML_ATTR_ESCAPE_HASH[ch] } + '"' end
#html_form composes HTML form key-value pairs as a x-www-form-urlencoded encoded string.
#html_form takes an array of pair of strings or an hash from string to string.
Escape.html_form([["a","b"], ["c","d"]]) #=> "a=b&c=d" Escape.html_form({"a"=>"b", "c"=>"d"}) #=> "a=b&c=d"
In the array form, it is possible to use same key more than once. (It is required for a HTML form which contains checkboxes and select element with multiple attribute.)
Escape.html_form([["k","1"], ["k","2"]]) #=> "k=1&k=2"
If the strings contains characters which must be escaped in x-www-form-urlencoded, they are escaped using %-encoding.
Escape.html_form([["k=","&;="]]) #=> "k%3D=%26%3B%3D"
The separator can be specified by the optional second argument.
Escape.html_form([["a","b"], ["c","d"]], ";") #=> "a=b;c=d"
See HTML 4.01 for details.
# File lib/escape.rb, line 164 def html_form(pairs, sep='&') r = '' first = true pairs.each {|k, v| # query-chars - pct-encoded - x-www-form-urlencoded-delimiters = # unreserved / "!" / "$" / "'" / "(" / ")" / "*" / "," / ":" / "@" / "/" / "?" # query-char - pct-encoded = unreserved / sub-delims / ":" / "@" / "/" / "?" # query-char = pchar / "/" / "?" = unreserved / pct-encoded / sub-delims / ":" / "@" / "/" / "?" # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" # x-www-form-urlencoded-delimiters = "&" / "+" / ";" / "=" r << sep if !first first = false k.each_byte {|byte| ch = byte.chr if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch r << "%" << ch.unpack("H2")[0].upcase else r << ch end } r << '=' v.each_byte {|byte| ch = byte.chr if %r{[^0-9A-Za-z\-\._~:/?@!\$'()*,]}n =~ ch r << "%" << ch.unpack("H2")[0].upcase else r << ch end } } r end
#html_text escapes a string appropriate for HTML text using character references.
It escapes 3 characters:
-
'&' to '&'
-
'<' to '<'
-
'>' to '>'
Escape.html_text("abc") #=> "abc" Escape.html_text("a & b < c > d") #=> "a & b < c > d"
This function is not appropriate for escaping HTML element attribute because quotes are not escaped.
# File lib/escape.rb, line 218 def html_text(str) str.gsub(/[&<>]/) {|ch| HTML_TEXT_ESCAPE_HASH[ch] } end
#shell_command composes a sequence of words to a single shell command line. All shell meta characters are quoted and the words are concatenated with interleaving space.
Escape.shell_command(["ls", "/"]) #=> "ls /" Escape.shell_command(["echo", "*"]) #=> "echo '*'"
Note that system(*command) and system(Escape.shell_command(command)) is roughly same. There are two exception as follows.
-
The first is that the later may invokes /bin/sh.
-
The second is an interpretation of an array with only one element: the element is parsed by the shell with the former but it is recognized as single word with the later. For example, system(*[“echo foo”]) invokes echo command with an argument “foo”. But system(Escape.shell_command([“echo foo”])) invokes “echo foo” command without arguments (and it probably fails).
# File lib/escape.rb, line 52 def shell_command(command) command.map {|word| shell_single_word(word) }.join(' ') end
#shell_single_word quotes shell meta characters.
The result string is always single shell word, even if the argument is “”. #shell_single_word(“”) returns “''”.
Escape.shell_single_word("") #=> "''" Escape.shell_single_word("foo") #=> "foo" Escape.shell_single_word("*") #=> "'*'"
# File lib/escape.rb, line 65 def shell_single_word(str) if str.empty? "''" elsif %r{\A[0-9A-Za-z+,./:=@_-]+\z} =~ str str else result = '' str.scan(/('+)|[^']+/) { if $1 result << %q{\} * $1.length else result << "'#{$&}'" end } result end end
#uri_path escapes URI path using percent-encoding. The given path should be a sequence of (non-escaped) segments separated by “/”. The segments cannot contains “/”.
Escape.uri_path("a/b/c") #=> "a/b/c" Escape.uri_path("a?b/c?d/e?f") #=> "a%3Fb/c%3Fd/e%3Ff"
The path is the part after authority before query in URI, as follows.
scheme://authority/path#fragment
See RFC 3986 for details of URI.
Note that this function is not appropriate to convert OS path to URI.
# File lib/escape.rb, line 115 def uri_path(str) str.gsub(%r{[^/]+}n) { uri_segment($&) } end
#uri_segment escapes URI segment using percent-encoding.
Escape.uri_segment("a/b") #=> "a%2Fb"
The segment is “/”-splitted element after authority before query in URI, as follows.
scheme://authority/segment1/segment2/.../segmentN?query#fragment
See RFC 3986 for details of URI.
# File lib/escape.rb, line 92 def uri_segment(str) # pchar - pct-encoded = unreserved / sub-delims / ":" / "@" # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" str.gsub(%r{[^A-Za-z0-9\-._~!$&'()*+,;=:@]}n) { '%' + $&.unpack("H2")[0].upcase } end