public class WordlistLoader
extends java.lang.Object
to obtain {@link Reader} instances
Modifier and Type | Field and Description |
---|---|
private static int |
INITIAL_CAPACITY |
Modifier | Constructor and Description |
---|---|
private |
WordlistLoader()
no instance
|
Modifier and Type | Method and Description |
---|---|
private static java.io.BufferedReader |
getBufferedReader(java.io.Reader reader) |
static java.util.List<java.lang.String> |
getLines(java.io.InputStream stream,
java.nio.charset.Charset charset)
Accesses a resource by name and returns the (non comment) lines containing
data using the given character encoding.
|
static CharArraySet |
getSnowballWordSet(java.io.Reader reader)
Reads stopwords from a stopword list in Snowball format.
|
static CharArraySet |
getSnowballWordSet(java.io.Reader reader,
CharArraySet result)
Reads stopwords from a stopword list in Snowball format.
|
static CharArrayMap<java.lang.String> |
getStemDict(java.io.Reader reader,
CharArrayMap<java.lang.String> result)
Reads a stem dictionary.
|
static CharArraySet |
getWordSet(java.io.Reader reader)
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
getWordSet(java.io.Reader reader,
CharArraySet result)
Reads lines from a Reader and adds every line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
getWordSet(java.io.Reader reader,
java.lang.String comment)
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
static CharArraySet |
getWordSet(java.io.Reader reader,
java.lang.String comment,
CharArraySet result)
Reads lines from a Reader and adds every non-comment line as an entry to a CharArraySet (omitting
leading and trailing whitespace).
|
private static final int INITIAL_CAPACITY
public static CharArraySet getWordSet(java.io.Reader reader, CharArraySet result) throws java.io.IOException
reader
- Reader containing the wordlistresult
- the CharArraySet
to fill with the readers wordsCharArraySet
with the reader's wordsjava.io.IOException
public static CharArraySet getWordSet(java.io.Reader reader) throws java.io.IOException
reader
- Reader containing the wordlistCharArraySet
with the reader's wordsjava.io.IOException
public static CharArraySet getWordSet(java.io.Reader reader, java.lang.String comment) throws java.io.IOException
reader
- Reader containing the wordlistcomment
- The string representing a comment.java.io.IOException
public static CharArraySet getWordSet(java.io.Reader reader, java.lang.String comment, CharArraySet result) throws java.io.IOException
reader
- Reader containing the wordlistcomment
- The string representing a comment.result
- the CharArraySet
to fill with the readers wordsCharArraySet
with the reader's wordsjava.io.IOException
public static CharArraySet getSnowballWordSet(java.io.Reader reader, CharArraySet result) throws java.io.IOException
The snowball format is the following:
reader
- Reader containing a Snowball stopword listresult
- the CharArraySet
to fill with the readers wordsCharArraySet
with the reader's wordsjava.io.IOException
public static CharArraySet getSnowballWordSet(java.io.Reader reader) throws java.io.IOException
The snowball format is the following:
reader
- Reader containing a Snowball stopword listCharArraySet
with the reader's wordsjava.io.IOException
public static CharArrayMap<java.lang.String> getStemDict(java.io.Reader reader, CharArrayMap<java.lang.String> result) throws java.io.IOException
word\tstem(i.e. two tab separated words)
java.io.IOException
- If there is a low-level I/O error.public static java.util.List<java.lang.String> getLines(java.io.InputStream stream, java.nio.charset.Charset charset) throws java.io.IOException
A comment line is any line that starts with the character "#"
java.io.IOException
- If there is a low-level I/O error.private static java.io.BufferedReader getBufferedReader(java.io.Reader reader)