The API and implementation of columns may change in the next version of Whoosh!
This module contains “Column” objects which you can use as the argument to a Field object’s sortable= keyword argument. Each field defines a default column type for when the user specifies sortable=True (the object returned by the field’s default_column() method).
The default column type for most fields is VarBytesColumn, although numeric and date fields use NumericColumn. Expert users may use other field types that may be faster or more storage efficient based on the field contents. For example, if a field always contains one of a limited number of possible values, a RefBytesColumn will save space by only storing the values once. If a field’s values are always a fixed length, the FixedBytesColumn saves space by not storing the length of each value.
A Column object basically exists to store configuration information and provides two important methods: writer() to return a ColumnWriter object and reader() to return a ColumnReader object.
Represents a “column” of rows mapping docnums to document values.
The interface requires that you store the start offset of the column, the length of the column data, and the number of documents (rows) separately, and pass them to the reader object.
Returns the default value for this column type.
Returns a ColumnReader object you can use to read a column of this type from disk.
Parameters: |
|
---|
Returns True if the column stores a list of values for each document instead of a single value.
Returns a ColumnWriter object you can use to use to create a column of this type on disk.
Parameters: | dbfile – the StructFile to write to. |
---|
Stores variable length byte strings. See also RefBytesColumn.
The current implementation limits the total length of all document values a segment to 2 GB.
The default value (the value returned for a document that didn’t have a value assigned to it at indexing time) is an empty bytestring (b'').
Stores fixed-length byte strings.
Parameters: |
|
---|
Stores variable-length or fixed-length byte strings, similar to VarBytesColumn and FixedBytesColumn. However, where those columns stores a value for each document, this column keeps a list of all the unique values in the field, and for each document stores a short pointer into the unique list. For fields where the number of possible values is smaller than the number of documents (for example, “category” or “chapter”), this saves significant space.
This column type supports a maximum of 65535 unique values across all documents in a segment. You should generally use this column type where the number of unique values is in no danger of approaching that number (for example, a “tags” field). If you try to index too many unique values, the column will convert additional unique values to the default value and issue a warning using the warnings module (this will usually be preferable to crashing the indexer and potentially losing indexed documents).
Parameters: |
|
---|
Stores numbers (integers and floats) as compact binary.
Parameters: |
|
---|
Stores a column of True/False values compactly.
Parameters: | compress_at – columns with this number of values or fewer will |
---|
be saved compressed on disk, and loaded into RAM for reading. Set this to 0 to disable compression.
Stores variable-length byte strings compressed using deflate (by default).
Parameters: |
|
---|
Converts arbitrary objects to pickled bytestrings and stores them using the wrapped column (usually a VarBytesColumn or CompressedBytesColumn).
If you can express the value you want to store as a number or bytestring, you should use the appropriate column type to avoid the time and size overhead of pickling and unpickling.