public class CachingNaiveBayesClassifier extends SimpleNaiveBayesClassifier
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
This is NOT an online classifier.
analyzer, classFieldName, indexSearcher, leafReader, query, textFieldNames
Constructor and Description |
---|
CachingNaiveBayesClassifier()
Creates a new NaiveBayes classifier with inside caching.
|
Modifier and Type | Method and Description |
---|---|
void |
reInitCache(int minTermOccurrenceInCache,
boolean justCachedTerms)
This function is building the frame of the cache.
|
void |
train(LeafReader leafReader,
String[] textFieldNames,
String classFieldName,
Analyzer analyzer,
Query query)
Train the classifier using the underlying Lucene index
|
void |
train(LeafReader leafReader,
String textFieldName,
String classFieldName,
Analyzer analyzer)
Train the classifier using the underlying Lucene index
|
void |
train(LeafReader leafReader,
String textFieldName,
String classFieldName,
Analyzer analyzer,
Query query)
Train the classifier using the underlying Lucene index
|
assignClass, countDocsWithClass, getClasses, getClasses, tokenizeDoc
public CachingNaiveBayesClassifier()
train()
before
you can classify any documents. If you want less memory usage you could
call reInitCache()
.public void train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer) throws IOException
train
in interface Classifier<BytesRef>
train
in class SimpleNaiveBayesClassifier
leafReader
- the reader to use to access the Lucene indextextFieldName
- the name of the field used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String textFieldName, String classFieldName, Analyzer analyzer, Query query) throws IOException
train
in interface Classifier<BytesRef>
train
in class SimpleNaiveBayesClassifier
leafReader
- the reader to use to access the Lucene indextextFieldName
- the name of the field used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textquery
- the query to filter which documents use for trainingIOException
- If there is a low-level I/O error.public void train(LeafReader leafReader, String[] textFieldNames, String classFieldName, Analyzer analyzer, Query query) throws IOException
train
in interface Classifier<BytesRef>
train
in class SimpleNaiveBayesClassifier
leafReader
- the reader to use to access the Lucene indextextFieldNames
- the names of the fields to be used to compare documentsclassFieldName
- the name of the field containing the class assigned to documentsanalyzer
- the analyzer used to tokenize / filter the unseen textquery
- the query to filter which documents use for trainingIOException
- If there is a low-level I/O error.public void reInitCache(int minTermOccurrenceInCache, boolean justCachedTerms) throws IOException
minTermOccurrenceInCache
- Lower cache size with higher value.justCachedTerms
- The switch for fully exclude low occurrence docs.IOException
- If there is a low-level I/O error.Copyright © 2000–2015 The Apache Software Foundation. All rights reserved.