public class SpellChecker
extends java.lang.Object
implements java.io.Closeable
Spell Checker class (Main class).
(initially inspired by the David Spencer code).
Example Usage:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory); // To index a field of a user index: spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field)); // To index a file containing words: spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt"))); String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
Modifier and Type | Field and Description |
---|---|
private float |
accuracy |
private float |
bEnd |
private float |
bStart
Boost value for start and end grams
|
private boolean |
closed |
private java.util.Comparator<SuggestWord> |
comparator |
static float |
DEFAULT_ACCURACY
The default minimum score to use, if not specified by calling
setAccuracy(float) . |
static java.lang.String |
F_WORD
Field name for each word in the ngram index.
|
private java.lang.Object |
modifyCurrentIndexLock |
private StringDistance |
sd |
private IndexSearcher |
searcher |
private java.lang.Object |
searcherLock |
(package private) Directory |
spellIndex
the spell index
|
Constructor and Description |
---|
SpellChecker(Directory spellIndex)
Use the given directory as a spell checker index with a
LevenshteinDistance as the default StringDistance . |
SpellChecker(Directory spellIndex,
StringDistance sd)
Use the given directory as a spell checker index.
|
SpellChecker(Directory spellIndex,
StringDistance sd,
java.util.Comparator<SuggestWord> comparator)
Use the given directory as a spell checker index with the given
StringDistance measure
and the given Comparator for sorting the results. |
Modifier and Type | Method and Description |
---|---|
private static void |
add(BooleanQuery.Builder q,
java.lang.String name,
java.lang.String value)
Add a clause to a boolean query.
|
private static void |
add(BooleanQuery.Builder q,
java.lang.String name,
java.lang.String value,
float boost)
Add a clause to a boolean query.
|
private static void |
addGram(java.lang.String text,
Document doc,
int ng1,
int ng2) |
void |
clearIndex()
Removes all terms from the spell check index.
|
void |
close()
Close the IndexSearcher used by this SpellChecker
|
private static Document |
createDocument(java.lang.String text,
int ng1,
int ng2) |
(package private) IndexSearcher |
createSearcher(Directory dir)
Creates a new read-only IndexSearcher
|
private void |
ensureOpen() |
boolean |
exist(java.lang.String word)
Check whether the word exists in the index.
|
private static java.lang.String[] |
formGrams(java.lang.String text,
int ng)
Form all ngrams for a given word.
|
float |
getAccuracy()
The accuracy (minimum score) to be used, unless overridden in
suggestSimilar(String, int, IndexReader, String, SuggestMode, float) , to
decide whether a suggestion is included or not. |
java.util.Comparator<SuggestWord> |
getComparator()
Gets the comparator in use for ranking suggestions.
|
private static int |
getMax(int l) |
private static int |
getMin(int l) |
StringDistance |
getStringDistance()
Returns the
StringDistance instance used by this
SpellChecker instance. |
void |
indexDictionary(Dictionary dict,
IndexWriterConfig config,
boolean fullMerge)
Indexes the data from the given
Dictionary . |
(package private) boolean |
isClosed()
|
private IndexSearcher |
obtainSearcher() |
private void |
releaseSearcher(IndexSearcher aSearcher) |
void |
setAccuracy(float acc)
Sets the accuracy 0 < minScore < 1; default
DEFAULT_ACCURACY |
void |
setComparator(java.util.Comparator<SuggestWord> comparator)
Sets the
Comparator for the SuggestWordQueue . |
void |
setSpellIndex(Directory spellIndexDir)
Use a different index as the spell checker index or re-open
the existing index if
spellIndex is the same value
as given in the constructor. |
void |
setStringDistance(StringDistance sd)
Sets the
StringDistance implementation for this
SpellChecker instance. |
java.lang.String[] |
suggestSimilar(java.lang.String word,
int numSug)
Suggest similar words.
|
java.lang.String[] |
suggestSimilar(java.lang.String word,
int numSug,
float accuracy)
Suggest similar words.
|
java.lang.String[] |
suggestSimilar(java.lang.String word,
int numSug,
IndexReader ir,
java.lang.String field,
SuggestMode suggestMode)
|
java.lang.String[] |
suggestSimilar(java.lang.String word,
int numSug,
IndexReader ir,
java.lang.String field,
SuggestMode suggestMode,
float accuracy)
Suggest similar words (optionally restricted to a field of an index).
|
private void |
swapSearcher(Directory dir) |
public static final float DEFAULT_ACCURACY
setAccuracy(float)
.public static final java.lang.String F_WORD
Directory spellIndex
private float bStart
private float bEnd
private IndexSearcher searcher
private final java.lang.Object searcherLock
private final java.lang.Object modifyCurrentIndexLock
private volatile boolean closed
private float accuracy
private StringDistance sd
private java.util.Comparator<SuggestWord> comparator
public SpellChecker(Directory spellIndex, StringDistance sd) throws java.io.IOException
spellIndex
- the spell index directorysd
- the StringDistance
measurement to usejava.io.IOException
- if Spellchecker can not open the directorypublic SpellChecker(Directory spellIndex) throws java.io.IOException
LevenshteinDistance
as the default StringDistance
. The
directory is created if it doesn't exist yet.spellIndex
- the spell index directoryjava.io.IOException
- if spellchecker can not open the directorypublic SpellChecker(Directory spellIndex, StringDistance sd, java.util.Comparator<SuggestWord> comparator) throws java.io.IOException
StringDistance
measure
and the given Comparator
for sorting the results.spellIndex
- The spelling indexsd
- The distancecomparator
- The comparatorjava.io.IOException
- if there is a problem opening the indexpublic void setSpellIndex(Directory spellIndexDir) throws java.io.IOException
spellIndex
is the same value
as given in the constructor.spellIndexDir
- the spell directory to useAlreadyClosedException
- if the Spellchecker is already closedjava.io.IOException
- if spellchecker can not open the directorypublic void setComparator(java.util.Comparator<SuggestWord> comparator)
Comparator
for the SuggestWordQueue
.comparator
- the comparatorpublic java.util.Comparator<SuggestWord> getComparator()
setComparator(Comparator)
public void setStringDistance(StringDistance sd)
StringDistance
implementation for this
SpellChecker
instance.sd
- the StringDistance
implementation for this
SpellChecker
instancepublic StringDistance getStringDistance()
StringDistance
instance used by this
SpellChecker
instance.StringDistance
instance used by this
SpellChecker
instance.public void setAccuracy(float acc)
DEFAULT_ACCURACY
acc
- The new accuracypublic float getAccuracy()
suggestSimilar(String, int, IndexReader, String, SuggestMode, float)
, to
decide whether a suggestion is included or not.public java.lang.String[] suggestSimilar(java.lang.String word, int numSug) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word
- the word you want a spell check done onnumSug
- the number of suggested wordsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedsuggestSimilar(String, int, IndexReader, String, SuggestMode, float)
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug, float accuracy) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word
- the word you want a spell check done onnumSug
- the number of suggested wordsaccuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the resultsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedsuggestSimilar(String, int, IndexReader, String, SuggestMode, float)
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug, IndexReader ir, java.lang.String field, SuggestMode suggestMode) throws java.io.IOException
java.io.IOException
public java.lang.String[] suggestSimilar(java.lang.String word, int numSug, IndexReader ir, java.lang.String field, SuggestMode suggestMode, float accuracy) throws java.io.IOException
As the Lucene similarity that is used to fetch the most relevant n-grammed terms is not the same as the edit distance strategy used to calculate the best matching spell-checked word from the hits that Lucene found, one usually has to retrieve a couple of numSug's in order to get the true best match.
I.e. if numSug == 1, don't count on that suggestion being the best one. Thus, you should set this value to at least 5 for a good suggestion.
word
- the word you want a spell check done onnumSug
- the number of suggested wordsir
- the indexReader of the user index (can be null see field param)field
- the field of the user index: if field is not null, the suggested
words are restricted to the words present in this field.suggestMode
- (NOTE: if indexReader==null and/or field==null, then this is overridden with SuggestMode.SUGGEST_ALWAYS)accuracy
- The minimum score a suggestion must have in order to qualify for inclusion in the resultsjava.io.IOException
- if the underlying index throws an IOException
AlreadyClosedException
- if the Spellchecker is already closedprivate static void add(BooleanQuery.Builder q, java.lang.String name, java.lang.String value, float boost)
private static void add(BooleanQuery.Builder q, java.lang.String name, java.lang.String value)
private static java.lang.String[] formGrams(java.lang.String text, int ng)
text
- the word to parseng
- the ngram length e.g. 3public void clearIndex() throws java.io.IOException
java.io.IOException
- If there is a low-level I/O error.AlreadyClosedException
- if the Spellchecker is already closedpublic boolean exist(java.lang.String word) throws java.io.IOException
word
- word to checkjava.io.IOException
- If there is a low-level I/O error.AlreadyClosedException
- if the Spellchecker is already closedpublic final void indexDictionary(Dictionary dict, IndexWriterConfig config, boolean fullMerge) throws java.io.IOException
Dictionary
.dict
- Dictionary to indexconfig
- IndexWriterConfig
to usefullMerge
- whether or not the spellcheck index should be fully mergedAlreadyClosedException
- if the Spellchecker is already closedjava.io.IOException
- If there is a low-level I/O error.private static int getMin(int l)
private static int getMax(int l)
private static Document createDocument(java.lang.String text, int ng1, int ng2)
private static void addGram(java.lang.String text, Document doc, int ng1, int ng2)
private IndexSearcher obtainSearcher()
private void releaseSearcher(IndexSearcher aSearcher) throws java.io.IOException
java.io.IOException
private void ensureOpen()
public void close() throws java.io.IOException
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
java.io.IOException
- if the close operation causes an IOException
AlreadyClosedException
- if the SpellChecker
is already closedprivate void swapSearcher(Directory dir) throws java.io.IOException
java.io.IOException
IndexSearcher createSearcher(Directory dir) throws java.io.IOException
dir
- the directory used to open the searcherjava.io.IOException
- f there is a low-level IO errorboolean isClosed()
true
if and only if the SpellChecker
is
closed, otherwise false
.