cprover
Data structures: core structures and AST
Author
Martin Brain, Peter Schrammel, Owen Jones

Strings: dstringt, the string_container and the ID_*

Within cbmc, strings are represented using irep_idt. By default this is typedefed to dstringt, which stores a string as an index into a large static table of strings. This makes it easy to compare if two irep_idts are equal (just compare the index) and it makes it efficient to store many copies of the same string. The static list of strings is initially populated from irep_ids.def, so for example the fourth entry in irep_ids.def is “IREP_ID_ONE(type)”, so the string “type” has index 3. You can refer to this irep_idt as ID_type. The other kind of line you see is “IREP_ID_TWO(C_source_location, #source_location)”, which means the irep_idt for the string “::source_location” can be referred to as ID_C_source_location. The “C” is for comment, which will be explained in the next section. Any strings that need to be stored as irep_ids which aren't in irep_ids.def are added to the end of the table when they are first encountered, and the same index is used for all instances.

See documentation at dstringt.

irept: a 4-triple (data, named-sub, comments, sub)

See documentation at irept.

As that documentation says, irepts are generic tree nodes. You should think of them as having a single string (data, actually an irep_idt) and lots of child nodes, some of which are numbered (sub) and some of which are labelled, and the label can either start with a “::” (comments-sub) or without one (named-sub). The meaning of the “::” is that this child should not be considered important, for example it shouldn’t be counted when comparing two irepts for equality.

typet

To be documented.

symbol_typet

To be documented.

exprt

exprt is the class to represent an expression. It inherits from irept, and the only things it adds to it are that every exprt has a named sub containing its type and everything in the sub of an exprt is again an exprt, not just an irept. You can think of exprt as a node in the abstract syntax tree for an expression. There are many classes that inherit from exprt and which represent more specific things. For example, there is minus_exprt, which has a sub of size 2 (for the two argument of minus).

Recall that every irept has one piece of data of its own, i.e. its id(), and all other information is in its named_sub, comments or sub. For exprts, the id() is used to specify why kind of exprt this is, so a minus_exprt has ID_minus as its id(). This means that a minus_exprt can be passed wherever an exprt is expected, and if you want to check if the expression you are looking at is a minus expression then you have to check its id() (or use can_cast_expr<minus_exprt>).

codet

exprt represents expressions and codet represents statements. codet inherits from exprt, so all codets are exprts, with id() ID_code. Many different kinds of statements inherit from codet, and they are distinguished by their statement(). For example, a while loop would be represented by a code_whilet, which has statement() ID_while. code_whilet has two operands in its sub, and helper functions to make it easier to use: cond() returns the first subexpression, which is the condition for the while loop, and body() returns the second subexpression, which is the body of the while loop - this has to be a codet, because the body of a while loop is a statement.

symbolt, symbol_table, and namespacet

To be documented.

Symbol lifetimes, symbol modes, name, base-name, pretty-name; semantic of lifetimes for symex?

To be documented.

Storing symbols and hiding symbols (namespacet)

To be documented.

ns.follow

To be documented.

Examples: how to represent the AST of statements, in C and in java

x = y + 123

To be documented..

if (x > 10) { y = 2 } else { v[3] = 4 }

To be documented.

Java arrays: struct Array { int length, int *data };

To be documented.