body { width 85% }; A { background: #FFF; color: #44F; } A:hover { color: #20b2aa; } A.anchor { color: #000; } A.anchor:hover { color: #000; } P { text-indent: 2em; } ul li { list-style-type: lower-roman; }
/* As the saying goes:
* "you can tell a lot about the code from the data structure".
*
* So this is the non-magic version of struct Ustr:
*
* struct Ustr
* {
* const char *ptr;
* size_t reference_count;
* size_t length;
* size_t size;
*
* unsigned char flag_is_readonly : 1;
* unsigned char flag_is_fixed : 1;
* unsigned char flag_exact_byte_allocations : 1;
* unsigned char flag_has_enomem_error : 1;
* unsigned char flag_fail_on_alloc : 1;
* };
*
* ...however this "eats" memory for "small" strings, which is one reason given
* why people don't use a real string ADT.
*
*/
struct Ustr { unsigned char data[1]; /* 0b_wxzy_nnoo => * * w = allocated * x = has size * y = round up allocations (off == exact allocations) * z = memory error * nn = refn * oo = lenn * * 0b_0000_0000 = "", const, no alloc fail, no mem err, refn=0, lenn=0 */ }; struct Ustrp { /* This is just used as a way to catch errors at compile time with static * typing. You might think we'd define Ustrp to be identical to Ustr, and * just cast however aliasing rules screw us over if we do that. */ struct Ustr s; };
/* Unused bit patterns for data[0]:
*
* 0bxxxx_x100 == lenn = 0
* 0bxxxx_1x00
* 0bxxx1_xx00
* 0bxx1x_xx00
* 0bx1xx_xx00 => is used in sized
* 0b1xxx_xx00
*
* 0bx1xx_xx11 == 128bit size_t, yeh!
* 0bx1xx_11xx
*
* 0b0xxx_x1xx == refn values for const/fixed
* 0b0xxx_1xxx
*
* 0b00x1_xxxx == mem error for const
* 0b001x_xxxx == not exact for const
*
*/
/*
* ================ English description of struct Ustr ================
*
* This strucure is very magic, as an optimization "" is a valid "structure"
* for an empty string, ustr_len("") == 0, ustr_size("") == 0,
* ustr_cstr("") == "".
* This also works for other "static data strings" if they start with the first
* four bits as 0 ... although for C const strings using this, the length has
* to be manually defined correctly at compile time.
*
* However it is also a "space efficient" String ADT, with a managed length,
* _either_ with or without reference counts (so that ustr_dup() doesn't copy)
* and with or without a stored size (so growing/shrinking a lot doesn't
* cause lots of malloc/realloc usage). Note that for lengths of strings, or
* reference counts that need >= 8 bytes of storage the Ustr _much_ contain
* a stored size.
*
* It can also track memory allocation errors over multiple function calls.
*
* If the first byte == 0, then it's "" as above.
* Otherwise the first byte contains four flags, and 2 numbers (2 bits each).
*
* The 1st flag says if the string is allocated. The 2nd flag says if it has a
* stored size. The 3rd flag says if it isn't using "exact allocations", if it
* is ustr_len() + ustr_overhead() == ustr_size_alloc(), otherwise there might
* be room to grow without doing a memory allocation call[1]. The 4th flag says
* if this Ustr has had a memory allocation error.
* The two numbers are the mappings for the length of the reference count, and
* the length of the length. Note that this mapping changes if the 2nd flag is
* set.
*
* [1] There is an implied "size" of the string, based an how big it is. This
* is a concession for speed efficiency, although it only grows at 1.5x
* not 2x which is the norm. (again, to reduce memory waste). Again if this is
* too much overhead, just use exact sized allocations.
*
* Also NOTE if the 1st and 2nd flags are set, this means that the Ustr is in
* fixed storge (like an auto array). Also if the 1st, 2nd and 3rd flags are
* set, this means that the Ustr is limited, Ie. it's in fixed storge and
* _cannot_ grow (all allocation attempts will fail).
*
*
* Next, possibly, comes the reference count (of the given length[2]).
* Next, if not "", comes the length of the data (of the given length[2]).
* Next, if non-zero length, comes the data, which can include NIL bytes.
* And finally, if not "", a NIL byte, to make sure it's always a valid C-Style
* string (although obviously any embeded NIL bytes will make it look shorter
* if something else just uses strlen(ustr_cstr())).
*
* [2] The sizes can currently be 1, 2, 4 or 8 bytes. Depending on the mapping
* of the value in the first byte (length changes dynamically, as you add data).
*
* Examples:
*
* The allocated string "a", using exact allocations and no reference
* counting, is the 4 bytes:
*
* {0x81, 0x01, 'a', 0x00}
*
*
* The allocated string "ab", using non-exact allocations and 1 byte reference
* counting (shared by 13 users), is the 6 bytes (there is no waste due to
* non-exact allocations):
*
* {0xA5, 0x0D, 0x02, 'a', 'b', 0x00}
*
*
* The allocated string "abcdefghijklmnop" (16 bytes, and a NIL), with
* non-exact allocations 2 byte reference counting (shared by 1,003 users), is
* the 24 bytes (with 3 bytes of "unused" space at the end):
*
* {0xA9, 0x03, 0xEB, 0x10, 'a', 'b', [...], 'o', 'p', 0x00, <x>, <x>, <x>}
*/
This design allows 83 different combinations of options for your string types:
Ustr *s1 = USTR(""); Ustr *s2 = USTR1(\x4, "abcd");
char buf_s3[USTR_SIZE_FIXED(128)] = "x112233abcd"; /* x = the info byte 11 = reference count storage - always two bytes 22 = len storage - always same size as the size 33 = size storage - uses 2, 4 or 8 bytes depending on sizeof() */ Ustr *s3 = USTR_SC_INIT_AUTO(buf_s3, USTR_FALSE, 4); /* s3 is now the string "abcd" */
char buf_s3[USTR_SIZE_FIXED(32)] = USTR_BEG_FIXED2 "abcd"; Ustr *s3 = USTR_SC_INIT_AUTO(buf_s3, USTR_TRUE, 4); /* s3 is now the string "abcd" */