The transformation specification essentially has the form of C code, except
that lines to remove are annotated with -
in the first column, and
lines to add are annotated with +
. A transformation specification
can also use dots, “...
”, describing an arbitrary sequence
of function arguments or instructions within a control-flow path.
Implicitly, “...
” matches the shortest path between something that
matches the pattern before the dots (or the beginning of the function, if
there is nothing before the dots) and something that matches the pattern
after the dots (or the end of the function, if there is nothing after the
dots). Dots may be modified with a when clause, indicating a pattern
that should not occur anywhere within the matched sequence. when any
removes the aforementioned constraint that “...
” matches the
shortest path. Finally, a transformation can specify a disjunction of
patterns, of the form ( pat1 | … |
patn ) where each (, | or ) is
in column 0 or preceded by \.
The grammar that we present for the transformation is not actually the grammar of the SmPL code that can be written by the programmer, but is instead the grammar of the slice of this consisting of the - annotated and the unannotated code (the context of the transformed lines), or the + annotated code and the unannotated code. For example, for parsing purposes, the following transformation is split into the two variants shown below and each is parsed separately.
1 proc_info_func(...) {
2 <...
3 - hostno
4 + hostptr->host_no
5 ...>
6 } |
1 proc_info_func(...) {
2 <...
3 - hostno
4 ...>
5 } |
1 proc_info_func(...) {
2 <...
3 + hostptr->host_no
4 ...>
5 } |
Requiring that both slices parse correctly ensures that the rule matches syntactically valid C code and that it produces syntactically valid C code. The generated parse trees are then merged for use in the subsequent matching and transformation process.
The grammar for the minus or plus slice of a transformation is as follows:
transformation | ::= | include + |
| | OPTDOTSEQ(top, when) | |
include | ::= | #include include_string |
top | ::= | expr |
| | decl_stmt + | |
| | fundecl | |
when | ::= | when != when_code |
| | when = rule_elem_stmt | |
| | when COMMA_LIST(any_strict) | |
| | when true != expr | |
| | when false != expr | |
when_code | ::= | OPTDOTSEQ(decl_stmt +, when) |
| | OPTDOTSEQ(expr, when) | |
rule_elem_stmt | ::= | one_decl |
| | expr; | |
| | return [expr]; | |
| | break; | |
| | continue; | |
| | \(rule_elem_stmt (\| rule_elem_stmt) +\) | |
any_strict | ::= | any |
| | strict | |
| | forall | |
| | exists |
OPTDOTSEQ(grammar_ds, when_ds) | ::= | |
[... (when_ds) *] grammar_ds (... (when_ds) * grammar_ds) * [... (when_ds) *] |
Lines may be annotated with an element of the set {-, +, *} or the singleton ?, or one of each set. ? represents at most one match of the given pattern, ie a match of the pattern is optional. * is used for semantic match, i.e., a pattern that highlights the fragments annotated with *, but does not perform any modification of the matched code. * cannot be mixed with - and +. There are some constraints on the use of these annotations:
Each element of a disjunction must be a proper term like an expression, a statement, an identifier or a declaration. Thus, the rule on the left below is not a syntactically correct SmPL rule. One may use the rule on the right instead.
1 @@
2 type T;
3 T b;
4 @@
5
6 (
7 writeb(...,
8 |
9 readb(...,
10 )
11 -(T)
12 b) |
1 @@
2 type T;
3 T b;
4 @@
5
6 (
7 read
8 |
9 write
10 )
11 (...,
12 - (T)
13 b) |
Some kinds of terms can only appear in + code. These include comments, ifdefs, and attributes (__attribute__((...))).