com.icl.saxon.om

Class XMLChar

public class XMLChar extends Object

This class defines the basic XML character properties. The data in this class can be used to verify that a character is a valid XML character or if the character is a space, name start, or name character.

A series of convenience methods are supplied to ease the burden of the developer. Because inlining the checks can improve per character performance, the tables of character properties are public. Using the character as an index into the CHARS array and applying the appropriate mask flag (e.g. MASK_VALID), yields the same results as calling the convenience methods. There is one exception: check the comments for the isValid method for details.

Version: $Id: XMLChar.java,v 1.16 2004/03/25 04:03:22 mrglavas Exp $

Author: Glenn Marcy, IBM Andy Clark, IBM Eric Ye, IBM Arnaud Le Hors, IBM Michael Glavassevich, IBM Rahul Srivastava, Sun Microsystems Inc.

Field Summary
static intMASK_CONTENT
Content character mask.
static intMASK_NAME
Name character mask.
static intMASK_NAME_START
Name start character mask.
static intMASK_NCNAME
NCName character mask.
static intMASK_NCNAME_START
NCName start character mask.
static intMASK_PUBID
Pubid character mask.
static intMASK_SPACE
Space character mask.
static intMASK_VALID
Valid character mask.
Method Summary
static charhighSurrogate(int c)
Returns the high surrogate of a supplemental character
static booleanisContent(int c)
Returns true if the specified character can be considered content.
static booleanisHighSurrogate(int c)
Returns whether the given character is a high surrogate
static booleanisInvalid(int c)
Returns true if the specified character is invalid.
static booleanisLowSurrogate(int c)
Returns whether the given character is a low surrogate
static booleanisMarkup(int c)
Returns true if the specified character can be considered markup.
static booleanisName(int c)
Returns true if the specified character is a valid name character as defined by production [4] in the XML 1.0 specification.
static booleanisNameStart(int c)
Returns true if the specified character is a valid name start character as defined by production [5] in the XML 1.0 specification.
static booleanisNCName(int c)
Returns true if the specified character is a valid NCName character as defined by production [5] in Namespaces in XML recommendation.
static booleanisNCNameStart(int c)
Returns true if the specified character is a valid NCName start character as defined by production [4] in Namespaces in XML recommendation.
static booleanisPubid(int c)
Returns true if the specified character is a valid Pubid character as defined by production [13] in the XML 1.0 specification.
static booleanisSpace(int c)
Returns true if the specified character is a space character as defined by production [3] in the XML 1.0 specification.
static booleanisSupplemental(int c)
Returns true if the specified character is a supplemental character.
static booleanisSurrogate(int c)
Return whether a given char (codepoint) is a surrogate (high or low) //MHK: this method reinstated from an earlier version of the Apache XMLChar module
static booleanisValid(int c)
Returns true if the specified character is valid.
static booleanisValidIANAEncoding(String ianaEncoding)
Returns true if the encoding name is a valid IANA encoding.
static booleanisValidJavaEncoding(String javaEncoding)
Returns true if the encoding name is a valid Java encoding.
static booleanisValidName(String name)
Check to see if a string is a valid Name according to [5] in the XML 1.0 Recommendation
static booleanisValidNCName(String ncName)
Check to see if a string is a valid NCName according to [4] from the XML Namespaces 1.0 Recommendation
static booleanisValidNmtoken(String nmtoken)
Check to see if a string is a valid Nmtoken according to [7] in the XML 1.0 Recommendation
static charlowSurrogate(int c)
Returns the low surrogate of a supplemental character
static intsupplemental(char h, char l)
Returns true the supplemental character corresponding to the given surrogates.

Field Detail

MASK_CONTENT

public static final int MASK_CONTENT
Content character mask. Special characters are those that can be considered the start of markup, such as '<' and '&'. The various newline characters are considered special as well. All other valid XML characters can be considered content.

This is an optimization for the inner loop of character scanning.

MASK_NAME

public static final int MASK_NAME
Name character mask.

MASK_NAME_START

public static final int MASK_NAME_START
Name start character mask.

MASK_NCNAME

public static final int MASK_NCNAME
NCName character mask.

MASK_NCNAME_START

public static final int MASK_NCNAME_START
NCName start character mask.

MASK_PUBID

public static final int MASK_PUBID
Pubid character mask.

MASK_SPACE

public static final int MASK_SPACE
Space character mask.

MASK_VALID

public static final int MASK_VALID
Valid character mask.

Method Detail

highSurrogate

public static char highSurrogate(int c)
Returns the high surrogate of a supplemental character

Parameters: c The supplemental character to "split".

isContent

public static boolean isContent(int c)
Returns true if the specified character can be considered content.

Parameters: c The character to check.

isHighSurrogate

public static boolean isHighSurrogate(int c)
Returns whether the given character is a high surrogate

Parameters: c The character to check.

isInvalid

public static boolean isInvalid(int c)
Returns true if the specified character is invalid.

Parameters: c The character to check.

isLowSurrogate

public static boolean isLowSurrogate(int c)
Returns whether the given character is a low surrogate

Parameters: c The character to check.

isMarkup

public static boolean isMarkup(int c)
Returns true if the specified character can be considered markup. Markup characters include '<', '&', and '%'.

Parameters: c The character to check.

isName

public static boolean isName(int c)
Returns true if the specified character is a valid name character as defined by production [4] in the XML 1.0 specification.

Parameters: c The character to check.

isNameStart

public static boolean isNameStart(int c)
Returns true if the specified character is a valid name start character as defined by production [5] in the XML 1.0 specification.

Parameters: c The character to check.

isNCName

public static boolean isNCName(int c)
Returns true if the specified character is a valid NCName character as defined by production [5] in Namespaces in XML recommendation.

Parameters: c The character to check.

isNCNameStart

public static boolean isNCNameStart(int c)
Returns true if the specified character is a valid NCName start character as defined by production [4] in Namespaces in XML recommendation.

Parameters: c The character to check.

isPubid

public static boolean isPubid(int c)
Returns true if the specified character is a valid Pubid character as defined by production [13] in the XML 1.0 specification.

Parameters: c The character to check.

isSpace

public static boolean isSpace(int c)
Returns true if the specified character is a space character as defined by production [3] in the XML 1.0 specification.

Parameters: c The character to check.

isSupplemental

public static boolean isSupplemental(int c)
Returns true if the specified character is a supplemental character.

Parameters: c The character to check.

isSurrogate

public static boolean isSurrogate(int c)
Return whether a given char (codepoint) is a surrogate (high or low) //MHK: this method reinstated from an earlier version of the Apache XMLChar module

isValid

public static boolean isValid(int c)
Returns true if the specified character is valid. This method also checks the surrogate character range from 0x10000 to 0x10FFFF.

If the program chooses to apply the mask directly to the CHARS array, then they are responsible for checking the surrogate character range.

Parameters: c The character to check.

isValidIANAEncoding

public static boolean isValidIANAEncoding(String ianaEncoding)
Returns true if the encoding name is a valid IANA encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an IANA encoding name.

Parameters: ianaEncoding The IANA encoding name.

isValidJavaEncoding

public static boolean isValidJavaEncoding(String javaEncoding)
Returns true if the encoding name is a valid Java encoding. This method does not verify that there is a decoder available for this encoding, only that the characters are valid for an Java encoding name.

Parameters: javaEncoding The Java encoding name.

isValidName

public static boolean isValidName(String name)
Check to see if a string is a valid Name according to [5] in the XML 1.0 Recommendation

Parameters: name string to check

Returns: true if name is a valid Name

isValidNCName

public static boolean isValidNCName(String ncName)
Check to see if a string is a valid NCName according to [4] from the XML Namespaces 1.0 Recommendation

Parameters: ncName string to check

Returns: true if name is a valid NCName

isValidNmtoken

public static boolean isValidNmtoken(String nmtoken)
Check to see if a string is a valid Nmtoken according to [7] in the XML 1.0 Recommendation

Parameters: nmtoken string to check

Returns: true if nmtoken is a valid Nmtoken

lowSurrogate

public static char lowSurrogate(int c)
Returns the low surrogate of a supplemental character

Parameters: c The supplemental character to "split".

supplemental

public static int supplemental(char h, char l)
Returns true the supplemental character corresponding to the given surrogates.

Parameters: h The high surrogate. l The low surrogate.