class WordSegmenter
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
private HHMMSegmenter |
hhmmSegmenter |
private SegTokenFilter |
tokenFilter |
Constructor and Description |
---|
WordSegmenter() |
Modifier and Type | Method and Description |
---|---|
SegToken |
convertSegToken(SegToken st,
java.lang.String sentence,
int sentenceStartOffset)
Process a
SegToken so that it is ready for indexing. |
java.util.List<SegToken> |
segmentSentence(java.lang.String sentence,
int startOffset)
Segment a sentence into words with
HHMMSegmenter |
private HHMMSegmenter hhmmSegmenter
private SegTokenFilter tokenFilter
public java.util.List<SegToken> segmentSentence(java.lang.String sentence, int startOffset)
HHMMSegmenter
sentence
- input sentencestartOffset
- start offset of sentenceList
of SegToken
public SegToken convertSegToken(SegToken st, java.lang.String sentence, int sentenceStartOffset)
SegToken
so that it is ready for indexing.
This method calculates offsets and normalizes the token with SegTokenFilter
.