abstract class AbstractDictionary
extends java.lang.Object
SmartChineseAnalyzer abstract dictionary implementation.
Contains methods for dealing with GB2312 encoding.
Modifier and Type | Field and Description |
---|---|
static int |
CHAR_NUM_IN_FILE
Dictionary data contains 6768 Chinese characters with frequency statistics.
|
static int |
GB2312_CHAR_NUM
Last Chinese Character in GB2312 (87 * 94).
|
static int |
GB2312_FIRST_CHAR
First Chinese Character in GB2312 (15 * 94)
Characters in GB2312 are arranged in a grid of 94 * 94, 0-14 are unassigned or punctuation.
|
Constructor and Description |
---|
AbstractDictionary() |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getCCByGB2312Id(int ccid)
Transcode from GB2312 ID to Unicode
|
short |
getGB2312Id(char ch)
Transcode from Unicode to GB2312
|
long |
hash1(char c)
32-bit FNV Hash Function
|
long |
hash1(char[] carray)
32-bit FNV Hash Function
|
int |
hash2(char c)
djb2 hash algorithm,this algorithm (k=33) was first reported by dan
bernstein many years ago in comp.lang.c.
|
int |
hash2(char[] carray)
djb2 hash algorithm,this algorithm (k=33) was first reported by dan
bernstein many years ago in comp.lang.c.
|
public static final int GB2312_FIRST_CHAR
public static final int GB2312_CHAR_NUM
public static final int CHAR_NUM_IN_FILE
public java.lang.String getCCByGB2312Id(int ccid)
Transcode from GB2312 ID to Unicode
GB2312 is divided into a 94 * 94 grid, containing 7445 characters consisting of 6763 Chinese characters and 682 symbols. Some regions are unassigned (reserved).
ccid
- GB2312 idpublic short getGB2312Id(char ch)
ch
- input character in Unicode, or character in Basic Latin range.public long hash1(char c)
c
- input characterpublic long hash1(char[] carray)
carray
- character arraypublic int hash2(char c)
c
- characterpublic int hash2(char[] carray)
carray
- character array