public class IndicNormalizer
extends java.lang.Object
Follows guidelines from Unicode 5.2, chapter 6, South Asian Scripts I and graphical decompositions from http://ldc.upenn.edu/myl/IndianScriptsUnicode.html
Modifier and Type | Class and Description |
---|---|
private static class |
IndicNormalizer.ScriptData |
Modifier and Type | Field and Description |
---|---|
private static int[][] |
decompositions
Decompositions according to Unicode 5.2,
and http://ldc.upenn.edu/myl/IndianScriptsUnicode.html
Most of these are not handled by unicode normalization anyway.
|
private static java.util.IdentityHashMap<java.lang.Character.UnicodeBlock,IndicNormalizer.ScriptData> |
scripts |
Constructor and Description |
---|
IndicNormalizer() |
Modifier and Type | Method and Description |
---|---|
private int |
compose(int ch0,
java.lang.Character.UnicodeBlock block0,
IndicNormalizer.ScriptData sd,
char[] text,
int pos,
int len)
Compose into standard form any compositions in the decompositions table.
|
private static int |
flag(java.lang.Character.UnicodeBlock ub) |
int |
normalize(char[] text,
int len)
Normalizes input text, and returns the new length.
|
private static final java.util.IdentityHashMap<java.lang.Character.UnicodeBlock,IndicNormalizer.ScriptData> scripts
private static final int[][] decompositions
private static int flag(java.lang.Character.UnicodeBlock ub)
public int normalize(char[] text, int len)
text
- input textlen
- valid lengthprivate int compose(int ch0, java.lang.Character.UnicodeBlock block0, IndicNormalizer.ScriptData sd, char[] text, int pos, int len)