Libparserutils
|
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
Go to the source code of this file.
Macros | |
#define | UTF8_TO_UCS4(s, len, ucs4, clen, error) |
Convert a UTF-8 multibyte sequence into a single UCS-4 character. More... | |
#define | UTF8_FROM_UCS4(ucs4, s, len, error) |
Convert a single UCS-4 character into a UTF-8 multibyte sequence. More... | |
#define | UTF8_LENGTH(s, max, len, error) |
Calculate the length (in characters) of a bounded UTF-8 string. More... | |
#define | UTF8_CHAR_BYTE_LENGTH(s, len, error) |
Calculate the length (in bytes) of a UTF-8 character. More... | |
#define | UTF8_PREV(s, off, prevoff, error) |
Find previous legal UTF-8 char in string. More... | |
#define | UTF8_NEXT(s, len, off, nextoff, error) |
Find next legal UTF-8 char in string. More... | |
#define | UTF8_NEXT_PARANOID(s, len, off, nextoff, error) |
Skip to start of next sequence in UTF-8 input. More... | |
Variables | |
const uint8_t | numContinuations [256] |
Number of continuation bytes for a given start byte. More... | |
UTF-8 manipulation macros (implementation).
Definition in file utf8impl.h.
#define UTF8_CHAR_BYTE_LENGTH | ( | s, | |
len, | |||
error | |||
) |
Calculate the length (in bytes) of a UTF-8 character.
s | Pointer to start of character |
len | Pointer to location to receive length |
error | Location to receive error code |
Definition at line 228 of file utf8impl.h.
#define UTF8_FROM_UCS4 | ( | ucs4, | |
s, | |||
len, | |||
error | |||
) |
Convert a single UCS-4 character into a UTF-8 multibyte sequence.
Encoding of UCS values outside the UTF-16 plane has been removed from RFC3629. This macro conforms to RFC2279, however.
ucs4 | The character to process (0 <= c <= 0x7FFFFFFF) (host endian) |
s | Pointer to pointer to output buffer, updated on exit |
len | Pointer to length, in bytes, of output buffer, updated on exit |
error | Location to receive error code |
Definition at line 123 of file utf8impl.h.
Calculate the length (in characters) of a bounded UTF-8 string.
s | The string |
max | Maximum length |
len | Pointer to location to receive length of string |
error | Location to receive error code |
Definition at line 182 of file utf8impl.h.
#define UTF8_NEXT | ( | s, | |
len, | |||
off, | |||
nextoff, | |||
error | |||
) |
Find next legal UTF-8 char in string.
s | The string (assumed valid) |
len | Maximum offset in string |
off | Offset in the string to start at |
nextoff | Pointer to location to receive offset of first byte of next legal character |
error | Location to receive error code |
Definition at line 274 of file utf8impl.h.
#define UTF8_NEXT_PARANOID | ( | s, | |
len, | |||
off, | |||
nextoff, | |||
error | |||
) |
Skip to start of next sequence in UTF-8 input.
s | The string (assumed to be of dubious validity) |
len | Maximum offset in string |
off | Offset in the string to start at |
nextoff | Pointer to location to receive offset of first byte of next legal character |
error | Location to receive error code |
Definition at line 303 of file utf8impl.h.
#define UTF8_PREV | ( | s, | |
off, | |||
prevoff, | |||
error | |||
) |
Find previous legal UTF-8 char in string.
s | The string |
off | Offset in the string to start at |
prevoff | Pointer to location to receive offset of first byte of previous legal character |
error | Location to receive error code |
Definition at line 249 of file utf8impl.h.
#define UTF8_TO_UCS4 | ( | s, | |
len, | |||
ucs4, | |||
clen, | |||
error | |||
) |
Convert a UTF-8 multibyte sequence into a single UCS-4 character.
Encoding of UCS values outside the UTF-16 plane has been removed from RFC3629. This macro conforms to RFC2279, however.
s | The sequence to process |
len | Length of sequence |
ucs4 | Pointer to location to receive UCS-4 character (host endian) |
clen | Pointer to location to receive byte length of UTF-8 sequence |
error | Location to receive error code |
Definition at line 34 of file utf8impl.h.