tango.text.Util

License:
BSD style:

Version:
Apr 2004: Initial release Dec 2006: South Seas version

author:
Kris



Placeholder for a variety of wee functions. These functions are all templated with the intent of being used for arrays of char, wchar, and dchar. However, they operate correctly with other array types also.

Several of these functions return an index value, representing where some criteria was identified. When said criteria is not matched, the functions return a value representing the array length provided to them. That is, for those scenarios where C functions might typically return -1 these functions return length instead. This operate nicely with D slices:
        auto text = "happy:faces";

        assert (text[0 .. locate (text, ':')] == "happy");

        assert (text[0 .. locate (text, '!')] == "happy:faces");
The contains() function is more convenient for trivial lookup

cases:
        if (contains ("fubar", '!'))
            ...
Note that where some functions expect a size_t as an argument, the D template-matching algorithm will fail where an int is provided instead. This is the typically the cause of "template not found" errors. Also note that name overloading is not supported cleanly by IFTI at this time, so is not applied here.



Applying the D "import alias" mechanism to this module is highly recommended, in order to limit namespace pollution:
        import Util = tango.text.Util;

        auto s = Util.trim ("  foo ");


Function templates:
        trim (source)                               // trim whitespace
        triml (source)                              // trim whitespace
        trimr (source)                              // trim whitespace
        strip (source, match)                       // trim elements
        stripl (source, match)                      // trim elements
        stripr (source, match)                      // trim elements
        chopl (source, match)                       // trim pattern match
        chopr (source, match)                       // trim pattern match
        delimit (src, set)                          // split on delims
        split (source, pattern)                     // split on pattern
        splitLines (source);                        // split on lines
        head (source, pattern, tail)                // split to head & tail
        join (source, postfix, output)              // join text segments
        prefix (dst, prefix, content...)            // prefix text segments
        postfix (dst, postfix, content...)          // postfix text segments
        combine (dst, prefix, postfix, content...)  // combine lotsa stuff
        repeat (source, count, output)              // repeat source
        replace (source, match, replacement)        // replace chars
        substitute (source, match, replacement)     // replace/remove matches
        count (source, match)                       // count instances
        contains (source, match)                    // has char?
        containsPattern (source, match)             // has pattern?
        index (source, match, start)                // find match index
        locate (source, match, start)               // find char
        locatePrior (source, match, start)          // find prior char
        locatePattern (source, match, start);       // find pattern
        locatePatternPrior (source, match, start);  // find prior pattern
        indexOf (s*, match, length)                 // low-level lookup
        mismatch (s1*, s2*, length)                 // low-level compare
        matching (s1*, s2*, length)                 // low-level compare
        isSpace (match)                             // is whitespace?
        unescape(source, output)                    // convert '\' prefixes
        layout (destination, format ...)            // featherweight printf
        lines (str)                                 // foreach lines
        quotes (str, set)                           // foreach quotes
        delimiters (str, set)                       // foreach delimiters
        patterns (str, pattern)                     // foreach patterns
Please note that any 'pattern' referred to within this module refers to a pattern of characters, and not some kind of regex descriptor. Use the Regex module for regex operation.

T[] trim(T)(T[] source);
Trim the provided array by stripping whitespace from both ends. Returns a slice of the original content

T[] triml(T)(T[] source);
Trim the provided array by stripping whitespace from the left. Returns a slice of the original content

T[] trimr(T)(T[] source);
Trim the provided array by stripping whitespace from the right. Returns a slice of the original content

T[] strip(T, S)(T[] source, S match);
Trim the given array by stripping the provided match from both ends. Returns a slice of the original content

T[] stripl(T, S)(T[] source, S match);
Trim the given array by stripping the provided match from the left hand side. Returns a slice of the original content

T[] stripr(T, S)(T[] source, S match);
Trim the given array by stripping the provided match from the right hand side. Returns a slice of the original content

T[] chopl(T, S)(T[] source, S match);
Chop the given source by stripping the provided match from the left hand side. Returns a slice of the original content

T[] chopr(T, S)(T[] source, S match);
Chop the given source by stripping the provided match from the right hand side. Returns a slice of the original content

T[] replace(T, S)(T[] source, S match, S replacement);
Replace all instances of one element with another (in place)

T[] substitute(T)(const(T)[] source, const(T)[] match, const(T)[] replacement);
Substitute all instances of match from source. Set replacement to null in order to remove instead of replace

size_t count(T)(const(T)[] source, const(T)[] match);
Count all instances of match within source

bool contains(T)(const(T)[] source, const(T) match);
Returns whether or not the provided array contains an instance of the given match

bool containsPattern(T)(const(T)[] source, const(T)[] match);
Returns whether or not the provided array contains an instance of the given match

size_t index(T)(const(T)[] source, const(T)[] match, size_t start = 0);
Return the index of the next instance of 'match' starting at position 'start', or source.length where there is no match.

Parameter 'start' defaults to 0

size_t rindex(T)(const(T)[] source, const(T)[] match, size_t start = size_t.max);
Return the index of the prior instance of 'match' starting just before 'start', or source.length where there is no match.

Parameter 'start' defaults to source.length

size_t locate(T)(const(T)[] source, const(T) match, size_t start = 0);
Return the index of the next instance of 'match' starting at position 'start', or source.length where there is no match.

Parameter 'start' defaults to 0

size_t locatePrior(T)(const(T)[] source, const(T) match, size_t start = size_t.max);
Return the index of the prior instance of 'match' starting just before 'start', or source.length where there is no match.

Parameter 'start' defaults to source.length

size_t locatePattern(T)(const(T)[] source, const(T)[] match, size_t start = 0);
Return the index of the next instance of 'match' starting at position 'start', or source.length where there is no match.

Parameter 'start' defaults to 0

size_t locatePatternPrior(T)(const(T)[] source, const(T)[] match, size_t start = size_t.max);
Return the index of the prior instance of 'match' starting just before 'start', or source.length where there is no match.

Parameter 'start' defaults to source.length

T[] head(T, S)(T[] src, S[] pattern, out T[] tail);
Split the provided array on the first pattern instance, and return the resultant head and tail. The pattern is excluded from the two segments.

Where a segment is not found, tail will be null and the return value will be the original array.

T[] tail(T, S)(T[] src, S[] pattern, out T[] head);
Split the provided array on the last pattern instance, and return the resultant head and tail. The pattern is excluded from the two segments.

Where a segment is not found, head will be null and the return value will be the original array.

T[][] delimit(T, M)(T[] src, const(M)[] set);
Split the provided array wherever a delimiter-set instance is found, and return the resultant segments. The delimiters are excluded from each of the segments. Note that delimiters are matched as a set of alternates rather than as a pattern.

Splitting on a single delimiter is considerably faster than splitting upon a set of alternatives.

Note that the src content is not duplicated by this function, but is sliced instead.

inout(T)[][] split(T)(inout(T)[] src, const(T)[] pattern);
Split the provided array wherever a pattern instance is found, and return the resultant segments. The pattern is excluded from each of the segments.

Note that the src content is not duplicated by this function, but is sliced instead.

alias splitLines;
Convert text into a set of lines, where each line is identified by a \n or \r\n combination. The line terminator is stripped from each resultant array

Note that the src content is not duplicated by this function, but is sliced instead.

T[] lineOf(T)(T[] src, size_t index);
Return the indexed line, where each line is identified by a \n or \r\n combination. The line terminator is stripped from the resultant line

Note that src content is not duplicated by this function, but is sliced instead.

T[] join(T)(const(T[])[] src, const(T)[] postfix = null, T[] dst = null);
Combine a series of text segments together, each appended with a postfix pattern. An optional output buffer can be provided to avoid heap activity - it should be large enough to contain the entire output, otherwise the heap will be used instead.

Returns a valid slice of the output, containing the concatenated text.

T[] prefix(T)(T[] dst, const(T)[] prefix, const(T[])[] src...);
Combine a series of text segments together, each prepended with a prefix pattern. An optional output buffer can be provided to avoid heap activity - it should be large enough to contain the entire output, otherwise the heap will be used instead.

Note that, unlike join(), the output buffer is specified first such that a set of trailing strings can be provided.

Returns a valid slice of the output, containing the concatenated text.

T[] postfix(T)(T[] dst, const(T)[] postfix, const(T[])[] src...);
Combine a series of text segments together, each appended with an optional postfix pattern. An optional output buffer can be provided to avoid heap activity - it should be large enough to contain the entire output, otherwise the heap will be used instead.

Note that, unlike join(), the output buffer is specified first such that a set of trailing strings can be provided.

Returns a valid slice of the output, containing the concatenated text.

T[] combine(T)(T[] dst, const(T)[] prefix, const(T)[] postfix, const(T[])[] src...);
Combine a series of text segments together, each prefixed and/or postfixed with optional strings. An optional output buffer can be provided to avoid heap activity - which should be large enough to contain the entire output, otherwise the heap will be used instead.

Note that, unlike join(), the output buffer is specified first such that a set of trailing strings can be provided.

Returns a valid slice of the output, containing the concatenated text.

T[] repeat(T)(const(T)[] src, size_t count, T[] dst = null);
Repeat an array for a specific number of times. An optional output buffer can be provided to avoid heap activity - it should be large enough to contain the entire output, otherwise the heap will be used instead.

Returns a valid slice of the output, containing the concatenated text.

bool isSpace(T)(T c);
Is the argument a whitespace character?

bool matching(T)(const(T)* s1, const(T)* s2, size_t length);
Return whether or not the two arrays have matching content

size_t indexOf(T)(const(T)* str, const(T) match, size_t length);
Returns the index of the first match in str, failing once length is reached. Note that we return 'length' for failure and a 0-based index on success

size_t mismatch(T)(const(T)* s1, const(T)* s2, size_t length);
Returns the index of a mismatch between s1 & s2, failing when length is reached. Note that we return 'length' upon failure (array content matches) and a 0-based index upon success.

Use this as a faster opEquals. Also provides the basis for a faster opCmp, since the index of the first mismatched character can be used to determine the return value

LineFruct!(T) lines(T)(T[] src);
Iterator to isolate lines.

Converts text into a set of lines, where each line is identified by a \n or \r\n combination. The line terminator is stripped from each resultant array.

        foreach (line; lines ("one\ntwo\nthree"))
                 ...


DelimFruct!(T,M) delimiters(T, M)(T[] src, const(M)[] set);
Iterator to isolate text elements.

Splits the provided array wherever a delimiter-set instance is found, and return the resultant segments. The delimiters are excluded from each of the segments. Note that delimiters are matched as a set of alternates rather than as a pattern.

Splitting on a single delimiter is considerably faster than splitting upon a set of alternatives.

        foreach (segment; delimiters ("one,two;three", ",;"))
                 ...


PatternFruct!(T) patterns(T)(const(T)[] src, const(T)[] pattern, const(T)[] sub = null);
Iterator to isolate text elements.

Split the provided array wherever a pattern instance is found, and return the resultant segments. Pattern are excluded from each of the segments, and an optional sub argument enables replacement.

        foreach (segment; patterns ("one, two, three", ", "))
                 ...


QuoteFruct!(T,M) quotes(T, M)(T[] src, const(M)[] set);
Iterator to isolate optionally quoted text elements.

As per elements(), but with the extension of being quote-aware; the set of delimiters is ignored inside a pair of quotes. Note that an unterminated quote will consume remaining content.

        foreach (quote; quotes ("one two 'three four' five", " "))
                 ...


T[] layout(T)(T[] output, const(T[])[] layout...);
Arranges text strings in order, using indices to specify where each particular argument should be positioned within the text. This is handy for collating I18N components, or as a simplistic and lightweight formatter. Indices range from zero through nine.

        // write ordered text to the console
        char[64] tmp;

        Cout (layout (tmp, "%1 is after %0", "zero", "one")).newline;


inout(T)[] unescape(T)(inout(T)[] src, T[] dst = null);
Convert 'escaped' chars to normal ones: \t => ^t for example. Supports \" \' \\ \a \b \f \n \r \t \v

pure nothrow @trusted size_t jhash(const(ubyte)* k, size_t len, size_t c = 0);
pure nothrow @trusted size_t jhash(const(void)[] x, size_t c = 0);
jhash() -- hash a variable-length key into a 32-bit value

k : the key (the unaligned variable-length array of bytes) len : the length of the key, counting by bytes level : can be any 4-byte value

Returns a 32-bit value. Every bit of the key affects every bit of the return value. Every 1-bit and 2-bit delta achieves avalanche.

About 4.3*len + 80 X86 instructions, with excellent pipelining

The best hash table sizes are powers of 2. There is no need to do mod a prime (mod is sooo slow!). If you need less than 32 bits, use a bitmask. For example, if you need only 10 bits, do

h = (h & hashmask(10));

In which case, the hash table should have hashsize(10) elements. If you are hashing n strings (ub1 **)k, do it like this:

for (i=0, h=0; i<n; ++i) h = hash( k[i], len[i], h);

By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net. You may use this code any way you wish, private, educational, or commercial. It's free.

See http://burtleburtle.net/bob/hash/evahash.html Use for hash table lookup, or anything where one collision in 2^32 is acceptable. Do NOT use for cryptographic purposes.

struct LineFruct(T);
Helper fruct for iterator lines(). A fruct is a low impact mechanism for capturing context relating to an opApply (conjunction of the names struct and foreach)

struct DelimFruct(T,M);
Helper fruct for iterator delims(). A fruct is a low impact mechanism for capturing context relating to an opApply (conjunction of the names struct and foreach)

struct PatternFruct(T);
Helper fruct for iterator patterns(). A fruct is a low impact mechanism for capturing context relating to an opApply (conjunction of the names struct and foreach)

struct QuoteFruct(T,M);
Helper fruct for iterator quotes(). A fruct is a low impact mechanism for capturing context relating to an opApply (conjunction of the names struct and foreach)


Page generated by Ddoc. Copyright (c) 2004 Kris Bell. All rights reserved