public final class DictionaryLookup extends Object implements IStemmer, Iterable<WordData>
Important: finite state automatons in Jan Daciuk's implementation use bytes not unicode characters. Therefore objects of this class always have to be constructed with an encoding used to convert Java strings to byte arrays and the other way around. You can use UTF-8 encoding, as it should not conflict with any control sequences and separator characters.
Constructor and Description |
---|
DictionaryLookup(Dictionary dictionary)
Creates a new object of this class using the given FSA for word lookups
and encoding for converting characters to bytes.
|
Modifier and Type | Method and Description |
---|---|
static String |
applyReplacements(CharSequence word,
LinkedHashMap<String,String> replacements)
Apply partial string replacements from a given map.
|
Dictionary |
getDictionary() |
char |
getSeparatorChar() |
Iterator<WordData> |
iterator()
Return an iterator over all
WordData entries available in the
embedded Dictionary . |
List<WordData> |
lookup(CharSequence word)
Searches the automaton for a symbol sequence equal to
word ,
followed by a separator. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
public DictionaryLookup(Dictionary dictionary) throws IllegalArgumentException
dictionary
- The dictionary to use for lookups.IllegalArgumentException
- if FSA's root node cannot be acquired (dictionary is empty).public List<WordData> lookup(CharSequence word)
word
,
followed by a separator. The result is a stem (decompressed accordingly
to the dictionary's specification) and an optional tag data.public static String applyReplacements(CharSequence word, LinkedHashMap<String,String> replacements)
word
- The word to apply replacements to.replacements
- A map of replacements (from->to).public Iterator<WordData> iterator()
WordData
entries available in the
embedded Dictionary
.public Dictionary getDictionary()
Dictionary
used by this object.public char getSeparatorChar()
DictionaryMetadata.separator
and
may not be valid in the target encoding (although this is highly unlikely).Copyright © 2016. All rights reserved.