Class Redwood::XapianIndex
In: lib/sup/xapian_index.rb
Parent: BaseIndex

This index implementation uses Xapian for searching and storage. It tends to be slightly faster than Ferret for indexing and significantly faster for searching due to precomputing thread membership.

Methods

Constants

STEM_LANGUAGE = "english"
INDEX_VERSION = '1'
MIN_DATE = Time.at 0   dates are converted to integers for xapian, and are used for document ids, so we must ensure they‘re reasonably valid. this typically only affect spam.
MAX_DATE = Time.at(2**31-1)
EACH_ID_PAGE = 100
NORMAL_PREFIX = { 'subject' => 'S', 'body' => 'B', 'from_name' => 'FN', 'to_name' => 'TN', 'name' => 'N', 'attachment' => 'A', }   Stemmed
BOOLEAN_PREFIX = { 'type' => 'K', 'from_email' => 'FE', 'to_email' => 'TE', 'email' => 'E', 'date' => 'D', 'label' => 'L', 'source_id' => 'I', 'attachment_extension' => 'O', 'msgid' => 'Q', 'thread' => 'H', 'ref' => 'R', }   Unstemmed
PREFIX = NORMAL_PREFIX.merge BOOLEAN_PREFIX
MSGID_VALUENO = 0
THREAD_VALUENO = 1
DATE_VALUENO = 2
MAX_TERM_LENGTH = 245
DOCID_SCALE = 2.0**32   Xapian can very efficiently sort in ascending docid order. Sup always wants to sort by descending date, so this method maps between them. In order to handle multiple messages per second, we use a logistic curve centered around MIDDLE_DATE so that the slope (docid/s) is greatest in this time period. A docid collision is not an error - the code will pick the next smallest unused one.
TIME_SCALE = 2.0**27
MIDDLE_DATE = Time.gm(2011)
Q = Xapian::Query

Public Class methods

Public Instance methods

TODO share code with the Ferret index

[Validate]