The FieldInfo class is the field descriptor
for the index. It specifies whether a field is compressed or not or whether
it should be indexed and tokenized. Every field has a name which must be a
symbol. There are three properties that you can set, :store
,
:index
and :term_vector
. You can also set the
default :boost
for a field as well.
The :store
property allows you to specify how a field is
stored. You can leave a field unstored (:no
), store it in it's
original format (:yes
) or store it in compressed format
(:compressed
). By default the document is stored in its
original format. If the field is large and it is stored elsewhere where it
is easily accessible you might want to leave it unstored. This will keep
the index size a lot smaller and make the indexing process a lot faster.
For example, you should probably leave the :content
field
unstored when indexing all the documents in your file-system.
The :index
property allows you to specify how a field is
indexed. A field must be indexed to be searchable. However, a field doesn't
need to be indexed to be store in the Ferret index. You may want to use the index as
a simple database and store things like images or MP3s in the index. By
default each field is indexed and tokenized (split into tokens)
(:yes
). If you don't want to index the field use
:no
. If you want the field indexed but not tokenized, use
:untokenized
. Do this for the fields you wish to sort by.
There are two other values for :index
;
:omit_norms
and :untokenized_omit_norms
. These
values correspond to :yes
and :untokenized
respectively and are useful if you are not boosting any fields and you'd
like to speed up the index. The norms file is the file which contains the
boost values for each document for a particular field.
See TermVector for a description of
term-vectors. You can specify whether or not you would like to store
term-vectors. The available options are :no
,
:yes
, :with_positions
, :with_offsets
and :with_positions_offsets
. Note that you need to store the
positions to associate offsets with individual terms in the term_vector.
Property Value Description ------------------------------------------------------------------------ :store | :no | Don't store field | | | :yes (default) | Store field in its original | | format. Use this value if you | | want to highlight matches. | | or print match excerpts a la | | Google search. | | | :compressed | Store field in compressed | | format. -------------|-------------------------|------------------------------ :index | :no | Do not make this field | | searchable. | | | :yes (default) | Make this field searchable and | | tokenized its contents. | | | :untokenized | Make this field searchable but | | do not tokenize its contents. | | use this value for fields you | | wish to sort by. | | | :omit_norms | Same as :yes except omit the | | norms file. The norms file can | | be omitted if you don't boost | | any fields and you don't need | | scoring based on field length. | | | :untokenized_omit_norms | Same as :untokenized except omit | | the norms file. Norms files can | | be omitted if you don't boost | | any fields and you don't need | | scoring based on field length. | | -------------|-------------------------|------------------------------ :term_vector | :no | Don't store term-vectors | | | :yes | Store term-vectors without | | storing positions or offsets. | | | :with_positions | Store term-vectors with | | positions. | | | :with_offsets | Store term-vectors with | | offsets. | | | :with_positions_offsets | Store term-vectors with | (default) | positions and offsets. -------------|-------------------------|------------------------------ :boost | Float | The boost property is used to | | set the default boost for a | | field. This boost value will | | used for all instances of the | | field in the index unless | | otherwise specified when you | | create the field. All values | | should be positive. | |
fi = FieldInfo.new(:title, :index => :untokenized, :term_vector => :no, :boost => 10.0) fi = FieldInfo.new(:content) fi = FieldInfo.new(:created_on, :index => :untokenized_omit_norms, :term_vector => :no) fi = FieldInfo.new(:image, :store => :compressed, :index => :no, :term_vector => :no)
Create a new FieldInfo object with the name
name
and the properties specified in options
. The
available options are [:store, :index, :term_vector, :boost]. See the
description of FieldInfo for more information
on these properties.
static VALUE frb_fi_init(int argc, VALUE *argv, VALUE self) { VALUE roptions, rname; FieldInfo *fi; StoreValue store = STORE_YES; IndexValue index = INDEX_YES; TermVectorValue term_vector = TERM_VECTOR_WITH_POSITIONS_OFFSETS; float boost = 1.0f; rb_scan_args(argc, argv, "11", &rname, &roptions); if (argc > 1) { frb_fi_get_params(roptions, &store, &index, &term_vector, &boost); } fi = fi_new(frb_field(rname), store, index, term_vector); fi->boost = boost; Frt_Wrap_Struct(self, NULL, &frb_fi_free, fi); object_add(fi, self); return self; }
Return the default boost for this field
static VALUE frb_fi_boost(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return rb_float_new((double)fi->boost); }
Return true if the field is stored in the index in compressed format.
static VALUE frb_fi_is_compressed(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_is_compressed(fi) ? Qtrue : Qfalse; }
Return true if this field has a norms file. This is the same as calling;
fi.indexed? and not fi.omit_norms?
static VALUE frb_fi_has_norms(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_has_norms(fi) ? Qtrue : Qfalse; }
Return true if the field is indexed, ie searchable in the index.
static VALUE frb_fi_is_indexed(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_is_indexed(fi) ? Qtrue : Qfalse; }
Return the name of the field
static VALUE frb_fi_name(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return ID2SYM(fi->name); }
Return true if the field omits the norm file. The norm file is the file used to store the field boosts for an indexed field. If you do not boost any fields, and you can live without scoring based on field length then you can omit the norms file. This will give the index a slight performance boost and it will use less memory, especially for indexes which have a large number of documents.
static VALUE frb_fi_omit_norms(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_omit_norms(fi) ? Qtrue : Qfalse; }
Return true if offsets are stored with the term-vectors for this field.
static VALUE frb_fi_store_offsets(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_store_offsets(fi) ? Qtrue : Qfalse; }
Return true if positions are stored with the term-vectors for this field.
static VALUE frb_fi_store_positions(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_store_positions(fi) ? Qtrue : Qfalse; }
Return true if the term-vectors are stored for this field.
static VALUE frb_fi_store_term_vector(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_store_term_vector(fi) ? Qtrue : Qfalse; }
Return true if the field is stored in the index.
static VALUE frb_fi_is_stored(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_is_stored(fi) ? Qtrue : Qfalse; }
Return a string representation of the FieldInfo object.
static VALUE frb_fi_to_s(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); char *fi_s = fi_to_s(fi); VALUE rfi_s = rb_str_new2(fi_s); free(fi_s); return rfi_s; }
Return true if the field is tokenized. Tokenizing is the process of breaking the field up into tokens. That is "the quick brown fox" becomes:
["the", "quick", "brown", "fox"]
A field can only be tokenized if it is indexed.
static VALUE frb_fi_is_tokenized(VALUE self) { FieldInfo *fi = (FieldInfo *)DATA_PTR(self); return fi_is_tokenized(fi) ? Qtrue : Qfalse; }