com.ibm.icu.lang
public final class UCharacter extends Object implements ECharacterCategory, ECharacterDirection
The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for more Unicode properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF). Each ICU release supports the latest version of Unicode available at that time.
Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
.
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode properties, the main differences between UCharacter and Character are:
Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare
In addition to Java compatibility functions, which calculate derived properties, this API provides low-level access to the Unicode Character Database.
Unicode assigns each code point (not just assigned character) values for many properties. Most of them are simple boolean flags, or constants from a small enumerated list. For some properties, values are strings or other relatively more complex types.
For more information see "About the Unicode Character Database" (http://www.unicode.org/ucd/) and the ICU User Guide chapter on Properties (http://icu.sourceforge.net/userguide/properties.html).
There are also functions that provide easy migration from C/POSIX functions like isblank(). Their use is generally discouraged because the C/POSIX standards do not define their semantics beyond the ASCII range, which means that different implementations exhibit very different behavior. Instead, Unicode properties should be used directly.
There are also only a few, broad C/POSIX character classes, and they tend to be used for conflicting purposes. For example, the "isalpha()" class is sometimes used to determine word boundaries, while a more sophisticated approach would at least distinguish initial letters from continuation characters (the latter including combining marks). (In ICU, BreakIterator is the most sophisticated API for word boundaries.) Another example: There is no "istitle()" class for titlecase characters.
ICU 3.4 and later provides API access for all twelve C/POSIX character classes. ICU implements them according to the Standard Recommendations in Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
API access for C/POSIX character classes is as follows:
- alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
- lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
- upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
- punct: ((1<
The C/POSIX character classes are also available in UnicodeSet patterns,
using patterns like [:graph:] or \p{graph}.
Note: There are several ICU (and Java) whitespace functions.
Comparison:
- isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
most of general categories "Z" (separators) + most whitespace ISO controls
(including no-break spaces, but excluding IS1..IS4 and ZWSP)
- isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
- isSpaceChar: just Z (including no-break spaces)
This class is not subclassable
See Also: com.ibm.icu.lang.UCharacterEnums UNKNOWN: ICU 2.1
Nested Class Summary | |
---|---|
static interface | UCharacter.DecompositionType
Decomposition Type constants. |
static interface | UCharacter.EastAsianWidth
East Asian Width constants. |
static interface | UCharacter.GraphemeClusterBreak
Grapheme Cluster Break constants. |
static interface | UCharacter.HangulSyllableType
Hangul Syllable Type constants.
|
static interface | UCharacter.JoiningGroup
Joining Group constants. |
static interface | UCharacter.JoiningType
Joining Type constants. |
static interface | UCharacter.LineBreak
Line Break constants. |
static interface | UCharacter.NumericType
Numeric Type constants. |
static interface | UCharacter.SentenceBreak
Sentence Break constants. |
static class | UCharacter.UnicodeBlock
A family of character subsets representing the character blocks in the
Unicode specification, generated from Unicode Data file Blocks.txt.
|
static interface | UCharacter.WordBreak
Word Break constants. |
Field Summary | |
---|---|
static int | FOLD_CASE_DEFAULT
Option value for case folding: use default mappings defined in CaseFolding.txt. |
static int | FOLD_CASE_EXCLUDE_SPECIAL_I
Option value for case folding: exclude the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt. |
static int | MAX_CODE_POINT
Cover the JDK 1.5 API, for convenience. |
static char | MAX_HIGH_SURROGATE
Cover the JDK 1.5 API, for convenience. |
static char | MAX_LOW_SURROGATE
Cover the JDK 1.5 API, for convenience. |
static int | MAX_RADIX
Compatibility constant for Java Character's MAX_RADIX. |
static char | MAX_SURROGATE
Cover the JDK 1.5 API, for convenience. |
static int | MAX_VALUE
The highest Unicode code point value (scalar value) according to the
Unicode Standard.
|
static int | MIN_CODE_POINT
Cover the JDK 1.5 API, for convenience. |
static char | MIN_HIGH_SURROGATE
Cover the JDK 1.5 API, for convenience. |
static char | MIN_LOW_SURROGATE
Cover the JDK 1.5 API, for convenience. |
static int | MIN_RADIX
Compatibility constant for Java Character's MIN_RADIX. |
static int | MIN_SUPPLEMENTARY_CODE_POINT
Cover the JDK 1.5 API, for convenience. |
static char | MIN_SURROGATE
Cover the JDK 1.5 API, for convenience. |
static int | MIN_VALUE
The lowest Unicode code point value. |
static double | NO_NUMERIC_VALUE
Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point. |
static int | REPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there
is no existing character. |
static int | SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points |
Method Summary | |
---|---|
static int | charCount(int cp)
Cover the JDK 1.5 API, for convenience. |
static int | codePointAt(CharSequence seq, int index)
Cover the JDK 1.5 API, for convenience. |
static int | codePointAt(char[] text, int index)
Cover the JDK 1.5 API, for convenience. |
static int | codePointAt(char[] text, int index, int limit)
Cover the JDK 1.5 API, for convenience. |
static int | codePointBefore(CharSequence seq, int index)
Cover the JDK 1.5 API, for convenience. |
static int | codePointBefore(char[] text, int index)
Cover the JDK 1.5 API, for convenience. |
static int | codePointBefore(char[] text, int index, int limit)
Cover the JDK 1.5 API, for convenience. |
static int | codePointCount(CharSequence text, int start, int limit)
Cover the JDK API, for convenience. |
static int | codePointCount(char[] text, int start, int limit)
Cover the JDK API, for convenience. |
static int | digit(int ch, int radix)
Retrieves the numeric value of a decimal digit code point.
|
static int | digit(int ch)
Retrieves the numeric value of a decimal digit code point.
|
static int | foldCase(int ch, boolean defaultmapping)
The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
|
static String | foldCase(String str, boolean defaultmapping)
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
|
static int | foldCase(int ch, int options)
The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
|
static String | foldCase(String str, int options)
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
|
static char | forDigit(int digit, int radix)
Provide the java.lang.Character forDigit API, for convenience. |
static VersionInfo | getAge(int ch) Get the "age" of the code point. The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character. |
static int | getCharFromExtendedName(String name) Find a Unicode character by either its name and return its code point value. |
static int | getCharFromName(String name) Find a Unicode code point by its most current Unicode name and return its code point value. |
static int | getCharFromName1_0(String name) Find a Unicode character by its version 1.0 Unicode name and return its code point value. |
static int | getCodePoint(char lead, char trail)
Returns a code point corresponding to the two UTF16 characters. |
static int | getCodePoint(char char16)
Returns the code point corresponding to the UTF16 character. |
static int | getCombiningClass(int ch)
Gets the combining class of the argument codepoint |
static int | getDirection(int ch)
Returns the Bidirection property of a code point.
|
static byte | getDirectionality(int cp)
Cover the JDK API, for convenience. |
static String | getExtendedName(int ch) Retrieves a name for a valid codepoint. |
static ValueIterator | getExtendedNameIterator() Gets an iterator for character names, iterating over codepoints. This API only gets the iterator for the extended names. |
static int | getHanNumericValue(int ch)
Return numeric value of Han code points.
|
static int | getIntPropertyMaxValue(int type)
Get the maximum value for an integer/binary Unicode property.
|
static int | getIntPropertyMinValue(int type)
Get the minimum value for an integer/binary Unicode property type.
|
static int | getIntPropertyValue(int ch, int type) Gets the property value for an Unicode property type of a code point. |
static String | getISOComment(int ch)
Get the ISO 10646 comment for a character.
|
static int | getMirror(int ch)
Maps the specified code point to a "mirror-image" code point.
|
static String | getName(int ch)
Retrieve the most current Unicode name of the argument code point, or
null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
|
static String | getName(String s, String separator)
Gets the names for each of the characters in a string |
static String | getName1_0(int ch)
Retrieve the earlier version 1.0 Unicode name of the argument code
point, or null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
|
static ValueIterator | getName1_0Iterator() Gets an iterator for character names, iterating over codepoints. This API only gets the iterator for the older 1.0 Unicode names. |
static ValueIterator | getNameIterator() Gets an iterator for character names, iterating over codepoints. This API only gets the iterator for the modern, most up-to-date Unicode names. |
static int | getNumericValue(int ch)
Returns the numeric value of the code point as a nonnegative
integer.
|
static int | getPropertyEnum(String propertyAlias)
Return the UProperty selector for a given property name, as
specified in the Unicode database file PropertyAliases.txt.
|
static String | getPropertyName(int property, int nameChoice)
Return the Unicode name for a given property, as given in the
Unicode database file PropertyAliases.txt. |
static int | getPropertyValueEnum(int property, String valueAlias)
Return the property value integer for a given value name, as
specified in the Unicode database file PropertyValueAliases.txt.
|
static String | getPropertyValueName(int property, int value, int nameChoice)
Return the Unicode name for a given property value, as given in
the Unicode database file PropertyValueAliases.txt. |
static String | getStringPropertyValue(int propertyEnum, int codepoint, int nameChoice)
Returns a string version of the property value. |
static int | getType(int ch)
Returns a value indicating a code point's Unicode category.
|
static RangeValueIterator | getTypeIterator() Gets an iterator for character types, iterating over codepoints. Example of use:RangeValueIterator iterator = UCharacter.getTypeIterator(); RangeValueIterator.Element element = new RangeValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.start) + " to codepoint \\u" + Integer.toHexString(element.limit - 1) + " has the character type " + element.value); } |
static double | getUnicodeNumericValue(int ch) Get the numeric value for a Unicode code point as defined in the Unicode Character Database. A "double" return type is necessary because some numeric values are fractions, negative, or too large for int. For characters without any numeric values in the Unicode Character Database, this function will return NO_NUMERIC_VALUE. API Change: In release 2.2 and prior, this API has a return type int and returns -1 when the argument ch does not have a corresponding numeric value. |
static VersionInfo | getUnicodeVersion()
Gets the version of Unicode data used. |
static boolean | hasBinaryProperty(int ch, int property) Check a binary Unicode property for a code point. Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt. This API is intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/. For names of Unicode properties see the UCD file PropertyAliases.txt. This API does not check the validity of the codepoint. Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available. |
static boolean | isBaseForm(int ch)
Determines whether the specified code point is of base form.
|
static boolean | isBMP(int ch)
Determines if the code point is in the BMP plane. |
static boolean | isDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date
Unicode standard.
|
static boolean | isDigit(int ch)
Determines if a code point is a Java digit.
|
static boolean | isHighSurrogate(char ch)
Cover the JDK 1.5 API, for convenience. |
static boolean | isIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an
ignorable character in a Unicode identifier.
|
static boolean | isISOControl(int ch)
Determines if the specified code point is an ISO control character.
|
static boolean | isJavaIdentifierPart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierPart. |
static boolean | isJavaIdentifierStart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierStart. |
static boolean | isJavaLetter(int cp)
Compatibility override of Java deprecated method. |
static boolean | isJavaLetterOrDigit(int cp)
Compatibility override of Java deprecated method. |
static boolean | isLegal(int ch)
A code point is illegal if and only if
|
static boolean | isLegal(String str)
A string is legal iff all its code points are legal.
|
static boolean | isLetter(int ch)
Determines if the specified code point is a letter.
|
static boolean | isLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit.
|
static boolean | isLowerCase(int ch)
Determines if the specified code point is a lowercase character.
|
static boolean | isLowSurrogate(char ch)
Cover the JDK 1.5 API, for convenience. |
static boolean | isMirrored(int ch)
Determines whether the code point has the "mirrored" property.
|
static boolean | isPrintable(int ch)
Determines whether the specified code point is a printable character
according to the Unicode standard. |
static boolean | isSpace(int ch)
Compatibility override of Java deprecated method. |
static boolean | isSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space
character, i.e. if code point is in the category Zs, Zl and Zp.
|
static boolean | isSupplementary(int ch)
Determines if the code point is a supplementary character.
|
static boolean | isSupplementaryCodePoint(int cp)
Cover the JDK 1.5 API, for convenience. |
static boolean | isSurrogatePair(char high, char low)
Cover the JDK 1.5 API, for convenience. |
static boolean | isTitleCase(int ch)
Determines if the specified code point is a titlecase character.
|
static boolean | isUAlphabetic(int ch) Check if a code point has the Alphabetic Unicode property. Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC). Different from UCharacter.isLetter(ch)! |
static boolean | isULowercase(int ch) Check if a code point has the Lowercase Unicode property. Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE). This is different from UCharacter.isLowerCase(ch)! |
static boolean | isUnicodeIdentifierPart(int ch)
Determines if the specified code point may be any part of a Unicode
identifier other than the starting character.
|
static boolean | isUnicodeIdentifierStart(int ch)
Determines if the specified code point is permissible as the first
character in a Unicode identifier.
|
static boolean | isUpperCase(int ch)
Determines if the specified code point is an uppercase character.
|
static boolean | isUUppercase(int ch) Check if a code point has the Uppercase Unicode property. Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE). This is different from UCharacter.isUpperCase(ch)! |
static boolean | isUWhiteSpace(int ch) Check if a code point has the White_Space Unicode property. Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE). This is different from both UCharacter.isSpace(ch) and UCharacter.isWhitespace(ch)! |
static boolean | isValidCodePoint(int cp)
Cover the JDK 1.5 API, for convenience. |
static boolean | isWhitespace(int ch)
Determines if the specified code point is a white space character.
|
static int | offsetByCodePoints(CharSequence text, int index, int codePointOffset)
Cover the JDK API, for convenience. |
static int | offsetByCodePoints(char[] text, int start, int count, int index, int codePointOffset)
Cover the JDK API, for convenience. |
static int | toChars(int cp, char[] dst, int dstIndex)
Cover the JDK 1.5 API, for convenience. |
static char[] | toChars(int cp)
Cover the JDK 1.5 API, for convenience. |
static int | toCodePoint(char high, char low)
Cover the JDK 1.5 API, for convenience. |
static int | toLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code
point has no lowercase equivalent, the code point itself is returned.
|
static String | toLowerCase(String str)
Gets lowercase version of the argument string.
|
static String | toLowerCase(Locale locale, String str)
Gets lowercase version of the argument string.
|
static String | toLowerCase(ULocale locale, String str)
Gets lowercase version of the argument string.
|
static String | toString(int ch)
Converts argument code point and returns a String object representing
the code point's value in UTF16 format.
|
static int | toTitleCase(int ch)
Converts the code point argument to titlecase.
|
static String | toTitleCase(String str, BreakIterator breakiter) Gets the titlecase version of the argument string. Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. |
static String | toTitleCase(Locale locale, String str, BreakIterator breakiter) Gets the titlecase version of the argument string. Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. |
static String | toTitleCase(ULocale locale, String str, BreakIterator titleIter) Gets the titlecase version of the argument string. Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. |
static int | toUpperCase(int ch)
Converts the character argument to uppercase.
|
static String | toUpperCase(String str)
Gets uppercase version of the argument string.
|
static String | toUpperCase(Locale locale, String str)
Gets uppercase version of the argument string.
|
static String | toUpperCase(ULocale locale, String str)
Gets uppercase version of the argument string.
|
UNKNOWN: ICU 2.6
UNKNOWN: ICU 2.6
See Also: CODEPOINT_MAX_VALUE
UNKNOWN: ICU 3.0
See Also: LEAD_SURROGATE_MAX_VALUE
UNKNOWN: ICU 3.0
See Also: TRAIL_SURROGATE_MAX_VALUE
UNKNOWN: ICU 3.0
UNKNOWN: ICU 3.4 This API might change or be removed in a future release.
See Also: SURROGATE_MAX_VALUE
UNKNOWN: ICU 3.0
UNKNOWN: ICU 2.1
See Also: CODEPOINT_MIN_VALUE
UNKNOWN: ICU 3.0
See Also: LEAD_SURROGATE_MIN_VALUE
UNKNOWN: ICU 3.0
See Also: TRAIL_SURROGATE_MIN_VALUE
UNKNOWN: ICU 3.0
UNKNOWN: ICU 3.4 This API might change or be removed in a future release.
See Also: SUPPLEMENTARY_MIN_VALUE
UNKNOWN: ICU 3.0
See Also: SURROGATE_MIN_VALUE
UNKNOWN: ICU 3.0
UNKNOWN: ICU 2.1
See Also: UCharacter
UNKNOWN: ICU 2.4
UNKNOWN: ICU 2.1
UNKNOWN: ICU 2.1
Parameters: cp the code point to check
Returns: the number of chars needed to represent the code point
See Also: UTF16
UNKNOWN: ICU 3.0
Parameters: seq the characters to check index the index of the first or only char forming the code point
Returns: the code point at the index
UNKNOWN: ICU 3.0
Parameters: text the characters to check index the index of the first or only char forming the code point
Returns: the code point at the index
UNKNOWN: ICU 3.0
Parameters: text the characters to check index the index of the first or only char forming the code point limit the limit of the valid text
Returns: the code point at the index
UNKNOWN: ICU 3.0
Parameters: seq the characters to check index the index after the last or only char forming the code point
Returns: the code point before the index
UNKNOWN: ICU 3.0
Parameters: text the characters to check index the index after the last or only char forming the code point
Returns: the code point before the index
UNKNOWN: ICU 3.0
Parameters: text the characters to check index the index after the last or only char forming the code point limit the start of the valid text
Returns: the code point before the index
UNKNOWN: ICU 3.0
Parameters: text the characters to check start the start of the range limit the limit of the range
Returns: the number of code points in the range
UNKNOWN: ICU 3.0
Parameters: text the characters to check start the start of the range limit the limit of the range
Returns: the number of code points in the range
UNKNOWN: ICU 3.0
java.lang.Character.digit()
. Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
Parameters: ch the code point to query radix the radix
Returns: the numeric value represented by the code point in the specified radix, or -1 if the code point is not a decimal digit or if its value is too large for the radix
UNKNOWN: ICU 2.1
digit(int, int)
that provides a decimal radix.
Parameters: ch the code point to query
Returns: the numeric value represented by the code point, or -1 if the code point is not a decimal digit or if its value is too large for a decimal radix
UNKNOWN: ICU 2.1
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch the character to be converted defaultmapping Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped.
Returns: the case folding equivalent of the character, if any; otherwise the character itself.
See Also: UCharacter
UNKNOWN: ICU 2.1
Parameters: str the String to be converted defaultmapping Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped.
Returns: the case folding equivalent of the character, if any; otherwise the character itself.
See Also: UCharacter
UNKNOWN: ICU 2.1
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch the character to be converted options A bit set for special processing. Currently the recognised options are FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT
Returns: the case folding equivalent of the character, if any; otherwise the character itself.
See Also: UCharacter
UNKNOWN: ICU 2.6
Parameters: str the String to be converted options A bit set for special processing. Currently the recognised options are FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT
Returns: the case folding equivalent of the character, if any; otherwise the character itself.
See Also: UCharacter
UNKNOWN: ICU 2.6
UNKNOWN: ICU 3.0
Get the "age" of the code point.
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
Parameters: ch The code point.
Returns: the Unicode version number
UNKNOWN: ICU 2.6
Find a Unicode character by either its name and return its code point value. All Unicode names are in uppercase. Extended names are all lowercase except for numbers and are contained within angle brackets.
The names are searched in the following orderParameters: name codepoint name
Returns: code point associated with the name or -1 if the name is not found.
UNKNOWN: ICU 2.6
Find a Unicode code point by its most current Unicode name and return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.Parameters: name most current Unicode character name whose code point is to be returned
Returns: code point or -1 if name is not found
UNKNOWN: ICU 2.1
Find a Unicode character by its version 1.0 Unicode name and return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.Parameters: name Unicode 1.0 code point name whose code point is to returned
Returns: code point or -1 if name is not found
UNKNOWN: ICU 2.1
Parameters: lead the lead char trail the trail char
Returns: code point if surrogate characters are valid.
Throws: IllegalArgumentException thrown when argument characters do not form a valid codepoint
UNKNOWN: ICU 2.1
Parameters: char16 the UTF16 character
Returns: code point if argument is a valid character.
Throws: IllegalArgumentException thrown when char16 is not a valid codepoint
UNKNOWN: ICU 2.1
Parameters: ch code point whose combining is to be retrieved
Returns: the combining class of the codepoint
UNKNOWN: ICU 2.1
Parameters: ch the code point to be determined its direction
Returns: direction constant from UCharacterDirection.
UNKNOWN: ICU 2.1
java.lang.Character
.Parameters: cp the code point to check
Returns: the directionality of the code point
See Also: UCharacter
UNKNOWN: ICU 3.0
Retrieves a name for a valid codepoint. Unlike, getName(int) and getName1_0(int), this method will return a name even for codepoints that are not assigned a name in UnicodeData.txt.
The names are returned in the following order.Parameters: ch the code point for which to get the name
Returns: a name for the argument codepoint
UNKNOWN: ICU 2.6
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the extended names. For modern, most up-to-date Unicode names use getNameIterator() or for older 1.0 Unicode names use get1_0NameIterator().
Example of use:ValueIterator iterator = UCharacter.getExtendedNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
Returns: an iterator
UNKNOWN: ICU 2.6
Parameters: ch code point to query
Returns: value if it is a Han 'numeric character,' otherwise return -1.
UNKNOWN: ICU 2.4
Parameters: type UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT.
Returns: Maximum value returned by u_getIntPropertyValue for a Unicode property. <= 0 if the property selector 'type' is out of range.
See Also: UProperty UCharacter UCharacter UCharacter UCharacter
UNKNOWN: ICU 2.4
Parameters: type UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT.
Returns: Minimum value returned by UCharacter.getIntPropertyValue(int) for a Unicode property. 0 if the property selector 'type' is out of range.
See Also: UProperty UCharacter UCharacter UCharacter UCharacter
UNKNOWN: ICU 2.4
Gets the property value for an Unicode property type of a code point. Also returns binary and mask property values.
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Sample usage: int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH); int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC); boolean b = (ideo == 1) ? true : false;
Parameters: ch code point to test. type UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT.
Returns: numeric value that is directly the property value or, for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary). Returns 0 or 1 (for false / true) for binary Unicode properties. Returns a bit-mask for mask properties. Returns 0 if 'type' is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.
See Also: UProperty UCharacter UCharacter UCharacter UCharacter
UNKNOWN: ICU 2.4
Parameters: ch The code point for which to get the ISO comment.
It must be 0<=c<=0x10ffff
.
Returns: The ISO comment, or null if there is no comment for this character.
UNKNOWN: ICU 2.4
Parameters: ch code point whose mirror is to be retrieved
Returns: another code point that may serve as a mirror-image substitute, or ch itself if there is no such mapping or ch does not have the "mirrored" property
UNKNOWN: ICU 2.1
Parameters: ch the code point for which to get the name
Returns: most current Unicode name
UNKNOWN: ICU 2.1
Deprecated: This API is ICU internal only.
Gets the names for each of the characters in a stringParameters: s string to format separator string to go between names
Returns: string of names
UNKNOWN:
Parameters: ch the code point for which to get the name
Returns: version 1.0 Unicode name
UNKNOWN: ICU 2.1
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the older 1.0 Unicode names. For modern, most up-to-date Unicode names use getNameIterator() or for extended names use getExtendedNameIterator().
Example of use:ValueIterator iterator = UCharacter.get1_0NameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
Returns: an iterator
UNKNOWN: ICU 2.6
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the modern, most up-to-date Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or for extended names use getExtendedNameIterator().
Example of use:ValueIterator iterator = UCharacter.getNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.
Returns: an iterator
UNKNOWN: ICU 2.6
Parameters: ch the code point to query
Returns: the numeric value of the code point, or -1 if it has no numeric value, or -2 if it has a numeric value that cannot be represented as a nonnegative integer
UNKNOWN: ICU 2.1
Parameters: propertyAlias the property name to be matched. The name is compared using "loose matching" as described in PropertyAliases.txt.
Returns: a UProperty enum.
Throws: IllegalArgumentException thrown if propertyAlias is not recognized.
See Also: UProperty
UNKNOWN: ICU 2.4
Parameters: property UProperty selector. nameChoice UProperty.NameChoice selector for which name to get. All properties have a long name. Most have a short name, but some do not. Unicode allows for additional names; if present these will be returned by UProperty.NameChoice.LONG + i, where i=1, 2,...
Returns: a name, or null if Unicode explicitly defines no name ("n/a") for a given property/nameChoice. If a given nameChoice throws an exception, then all larger values of nameChoice will throw an exception. If null is returned for a given nameChoice, then other nameChoice values may return non-null results.
Throws: IllegalArgumentException thrown if property or nameChoice are invalid.
See Also: UProperty NameChoice
UNKNOWN: ICU 2.4
Parameters: property UProperty selector constant. UProperty.INT_START <= property < UProperty.INT_LIMIT or UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or UProperty.MASK_START < = property < UProperty.MASK_LIMIT. Only these properties can be enumerated. valueAlias the value name to be matched. The name is compared using "loose matching" as described in PropertyValueAliases.txt.
Returns: a value integer. Note: UProperty.GENERAL_CATEGORY values are mask values produced by left-shifting 1 by UCharacter.getType(). This allows grouped categories such as [:L:] to be represented.
Throws: IllegalArgumentException if property is not a valid UProperty selector
See Also: UProperty
UNKNOWN: ICU 2.4
Parameters: property UProperty selector constant. UProperty.INT_START <= property < UProperty.INT_LIMIT or UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or UProperty.MASK_START < = property < UProperty.MASK_LIMIT. If out of range, null is returned. value selector for a value for the given property. In general, valid values range from 0 up to some maximum. There are a few exceptions: (1.) UProperty.BLOCK values begin at the non-zero value BASIC_LATIN.getID(). (2.) UProperty.CANONICAL_COMBINING_CLASS values are not contiguous and range from 0..240. (3.) UProperty.GENERAL_CATEGORY_MASK values are mask values produced by left-shifting 1 by UCharacter.getType(). This allows grouped categories such as [:L:] to be represented. Mask values are non-contiguous. nameChoice UProperty.NameChoice selector for which name to get. All values have a long name. Most have a short name, but some do not. Unicode allows for additional names; if present these will be returned by UProperty.NameChoice.LONG + i, where i=1, 2,...
Returns: a name, or null if Unicode explicitly defines no name ("n/a") for a given property/value/nameChoice. If a given nameChoice throws an exception, then all larger values of nameChoice will throw an exception. If null is returned for a given nameChoice, then other nameChoice values may return non-null results.
Throws: IllegalArgumentException thrown if property, value, or nameChoice are invalid.
See Also: UProperty NameChoice
UNKNOWN: ICU 2.4
Deprecated: This API is ICU internal only.
Returns a string version of the property value.Parameters: propertyEnum codepoint nameChoice
Returns: value as string
UNKNOWN:
Parameters: ch code point whose type is to be determined
Returns: category which is a value of UCharacterCategory
UNKNOWN: ICU 2.1
Gets an iterator for character types, iterating over codepoints.
Example of use:RangeValueIterator iterator = UCharacter.getTypeIterator(); RangeValueIterator.Element element = new RangeValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.start) + " to codepoint \\u" + Integer.toHexString(element.limit - 1) + " has the character type " + element.value); }
Returns: an iterator
UNKNOWN: ICU 2.6
Get the numeric value for a Unicode code point as defined in the Unicode Character Database.
A "double" return type is necessary because some numeric values are fractions, negative, or too large for int.
For characters without any numeric values in the Unicode Character Database, this function will return NO_NUMERIC_VALUE.
API Change: In release 2.2 and prior, this API has a return type int and returns -1 when the argument ch does not have a corresponding numeric value. This has been changed to synch with ICU4C
This corresponds to the ICU4C function u_getNumericValue.Parameters: ch Code point to get the numeric value for.
Returns: numeric value of ch, or NO_NUMERIC_VALUE if none is defined.
UNKNOWN: ICU 2.4
Returns: the unicode version number used
UNKNOWN: ICU 2.1
Check a binary Unicode property for a code point.
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
This API is intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
This API does not check the validity of the codepoint.
Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available.
Parameters: ch code point to test. property selector constant from com.ibm.icu.lang.UProperty, identifies which binary property to check.
Returns: true or false according to the binary Unicode property value for ch. Also false if property is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.
See Also: UProperty
UNKNOWN: ICU 2.6
Parameters: ch code point to be determined if it is of base form
Returns: true if the code point is of base form
UNKNOWN: ICU 2.1
Parameters: ch code point to be determined if it is not a supplementary character
Returns: true if code point is not a supplementary character
UNKNOWN: ICU 2.1
Parameters: ch code point to be determined if it is defined in the most current version of Unicode
Returns: true if this code point is defined in unicode
UNKNOWN: ICU 2.1
java.lang.Character.isDigit()
. It returns true for decimal
digits only.
Parameters: ch code point to query
Returns: true if this code point is a digit
UNKNOWN: ICU 2.1
Parameters: ch the char to check
Returns: true if ch is a high (lead) surrogate
UNKNOWN: ICU 3.0
Parameters: ch code point to be determined if it can be ignored in a Unicode identifier.
Returns: true if the code point is ignorable
UNKNOWN: ICU 2.1
Parameters: ch code point to determine if it is an ISO control character
Returns: true if code point is a ISO control character
UNKNOWN: ICU 2.1
Parameters: cp the code point
Returns: true if the code point can continue a java identifier.
UNKNOWN: ICU 3.4 This API might change or be removed in a future release.
Parameters: cp the code point
Returns: true if the code point can start a java identifier.
UNKNOWN: ICU 3.4 This API might change or be removed in a future release.
Deprecated: ICU 3.4 (Java)
Compatibility override of Java deprecated method. This method will always remain deprecated. Delegates to java.lang.Character.isJavaIdentifierStart.Parameters: cp the code point
Returns: true if the code point can start a java identifier.
Deprecated: ICU 3.4 (Java)
Compatibility override of Java deprecated method. This method will always remain deprecated. Delegates to java.lang.Character.isJavaIdentifierPart.Parameters: cp the code point
Returns: true if the code point can continue a java identifier.
Parameters: ch code point to determine if it is a legal code point by itself
Returns: true if and only if legal.
UNKNOWN: ICU 2.1
Parameters: str containing code points to examin
Returns: true if and only if legal.
UNKNOWN: ICU 2.1
Parameters: ch code point to determine if it is a letter
Returns: true if code point is a letter
UNKNOWN: ICU 2.1
Parameters: ch code point to determine if it is a letter or a digit
Returns: true if code point is a letter or a digit
UNKNOWN: ICU 2.1
Parameters: ch code point to determine if it is in lowercase
Returns: true if code point is a lowercase character
UNKNOWN: ICU 2.1
Parameters: ch the char to check
Returns: true if ch is a low (trail) surrogate
UNKNOWN: ICU 3.0
Parameters: ch code point whose mirror is to be determined
Returns: true if the code point has the "mirrored" property
UNKNOWN: ICU 2.1
Parameters: ch code point to be determined if it is printable
Returns: true if the code point is a printable character
UNKNOWN: ICU 2.1
Deprecated: ICU 3.4 (Java)
Compatibility override of Java deprecated method. This method will always remain deprecated. Delegates to java.lang.Character.isSpace.Parameters: ch the code point
Returns: true if the code point is a space character as defined by java.lang.Character.isSpace.
Parameters: ch code point to determine if it is a space
Returns: true if the specified code point is a space character
UNKNOWN: ICU 2.1
Parameters: ch code point to be determined if it is in the supplementary plane
Returns: true if code point is a supplementary character
UNKNOWN: ICU 2.1
Parameters: cp the code point to check
Returns: true if cp is a supplementary code point
UNKNOWN: ICU 3.0
Parameters: high the high (lead) char low the low (trail) char
Returns: true if high, low form a surrogate pair
UNKNOWN: ICU 3.0
Parameters: ch code point to determine if it is in title case
Returns: true if the specified code point is a titlecase character
UNKNOWN: ICU 2.1
Check if a code point has the Alphabetic Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).
Different from UCharacter.isLetter(ch)!
Parameters: ch codepoint to be tested
UNKNOWN: ICU 2.6
Check if a code point has the Lowercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).
This is different from UCharacter.isLowerCase(ch)!
Parameters: ch codepoint to be tested
UNKNOWN: ICU 2.6
Parameters: ch code point to determine if is can be part of a Unicode identifier
Returns: true if code point is any character belonging a unicode identifier suffix after the first character
UNKNOWN: ICU 2.1
Parameters: ch code point to determine if it can start a Unicode identifier
Returns: true if code point is the first character belonging a unicode identifier
UNKNOWN: ICU 2.1
Parameters: ch code point to determine if it is in uppercase
Returns: true if the code point is an uppercase character
UNKNOWN: ICU 2.1
Check if a code point has the Uppercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).
This is different from UCharacter.isUpperCase(ch)!
Parameters: ch codepoint to be tested
UNKNOWN: ICU 2.6
Check if a code point has the White_Space Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).
This is different from both UCharacter.isSpace(ch) and UCharacter.isWhitespace(ch)!
Parameters: ch codepoint to be tested
UNKNOWN: ICU 2.6
Parameters: cp the code point to check
Returns: true if cp is a valid code point
UNKNOWN: ICU 3.0
Parameters: ch code point to determine if it is a white space
Returns: true if the specified code point is a white space character
UNKNOWN: ICU 2.1
Parameters: text the characters to check index the index to adjust codePointOffset the number of code points by which to offset the index
Returns: the adjusted index
UNKNOWN: ICU 3.0
Parameters: text the characters to check start the start of the range to check count the length of the range to check index the index to adjust codePointOffset the number of code points by which to offset the index
Returns: the adjusted index
UNKNOWN: ICU 3.0
Parameters: cp the code point to convert dst the destination array into which to put the char(s) representing the code point dstIndex the index at which to put the first (or only) char
Returns: the count of the number of chars written (1 or 2)
Throws: IllegalArgumentException if cp is not a valid code point
UNKNOWN: ICU 3.0
Parameters: cp the code point to convert
Returns: an array containing the char(s) representing the code point
Throws: IllegalArgumentException if cp is not a valid code point
UNKNOWN: ICU 3.0
Parameters: high the high (lead) surrogate low the low (trail) surrogate
Returns: the code point formed by the surrogate pair
UNKNOWN: ICU 3.0
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch code point whose lowercase equivalent is to be retrieved
Returns: the lowercase equivalent code point
UNKNOWN: ICU 2.1
Parameters: str source string to be performed on
Returns: lowercase version of the argument string
UNKNOWN: ICU 2.1
Parameters: locale which string is to be converted in str source string to be performed on
Returns: lowercase version of the argument string
UNKNOWN: ICU 2.1
Parameters: locale which string is to be converted in str source string to be performed on
Returns: lowercase version of the argument string
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
Parameters: ch code point
Returns: string representation of the code point, null if code point is not defined in unicode
UNKNOWN: ICU 2.1
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch code point whose title case is to be retrieved
Returns: titlecase code point
UNKNOWN: ICU 2.1
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the default locale and context-sensitive
Parameters: str source string to be performed on breakiter break iterator to determine the positions in which the character should be title cased.
Returns: lowercase version of the argument string
UNKNOWN: ICU 2.6
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
Parameters: locale which string is to be converted in str source string to be performed on breakiter break iterator to determine the positions in which the character should be title cased.
Returns: lowercase version of the argument string
UNKNOWN: ICU 2.6
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
Parameters: locale which string is to be converted in str source string to be performed on titleIter break iterator to determine the positions in which the character should be title cased.
Returns: lowercase version of the argument string
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://icu.sourceforge.net/userguide/posix.html#case_mappings
Parameters: ch code point whose uppercase is to be retrieved
Returns: uppercase code point
UNKNOWN: ICU 2.1
Parameters: str source string to be performed on
Returns: uppercase version of the argument string
UNKNOWN: ICU 2.1
Parameters: locale which string is to be converted in str source string to be performed on
Returns: uppercase version of the argument string
UNKNOWN: ICU 2.1
Parameters: locale which string is to be converted in str source string to be performed on
Returns: uppercase version of the argument string
UNKNOWN: ICU 3.2 This API might change or be removed in a future release.