weka.attributeSelection
Class SubsetSizeForwardSelection

java.lang.Object
  extended by weka.attributeSelection.ASSearch
      extended by weka.attributeSelection.SubsetSizeForwardSelection
All Implemented Interfaces:
java.io.Serializable, OptionHandler, RevisionHandler

public class SubsetSizeForwardSelection
extends ASSearch
implements OptionHandler

SubsetSizeForwardSelection:

Extension of LinearForwardSelection. The search performs an interior cross-validation (seed and number of folds can be specified). A LinearForwardSelection is performed on each foldto determine the optimal subset-size (using the given SubsetSizeEvaluator). Finally, a LinearForwardSelection up to the optimal subset-size is performed on the whole data.

For more information see:

Martin Guetlein (2006). Large Scale Attribute Selection Using Wrappers. Freiburg, Germany.

Valid options are:

 -I
  Perform initial ranking to select the
  top-ranked attributes.
 -K <num>
  Number of top-ranked attributes that are 
  taken into account by the search.
 -T <0 = fixed-set | 1 = fixed-width>
  Type of Linear Forward Selection (default = 0).
 -S <num>
  Size of lookup cache for evaluated subsets.
  Expressed as a multiple of the number of
  attributes in the data set. (default = 1)
 -E <subset evaluator>
  Subset-evaluator used for subset-size determination.-- -M
 -F <num>
  Number of cross validation folds
  for subset size determination (default = 5).
 -R <num>
  Seed for cross validation
  subset size determination. (default = 1)
 -Z
  verbose on/off
 
 Options specific to evaluator weka.attributeSelection.ClassifierSubsetEval:
 
 -B <classifier>
  class name of the classifier to use for accuracy estimation.
  Place any classifier options LAST on the command line
  following a "--". eg.:
   -B weka.classifiers.bayes.NaiveBayes ... -- -K
  (default: weka.classifiers.rules.ZeroR)
 -T
  Use the training data to estimate accuracy.
 -H <filename>
  Name of the hold out/test set to 
  estimate accuracy on.
 
 Options specific to scheme weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console

Version:
$Revision: 5605 $
Author:
Martin Guetlein (martin.guetlein@gmail.com)
See Also:
Serialized Form

Field Summary
static Tag[] TAGS_TYPE
           
 
Constructor Summary
SubsetSizeForwardSelection()
          Constructor
 
Method Summary
 int getLookupCacheSize()
          Return the maximum size of the evaluated subset cache (expressed as a multiplier for the number of attributes in a data set.
 int getNumSubsetSizeCVFolds()
          Get the number of cross validation folds for subset size determination (default = 5).
 int getNumUsedAttributes()
          Get the number of top-ranked attributes that taken into account by the search process.
 java.lang.String[] getOptions()
          Gets the current settings of LinearForwardSelection.
 boolean getPerformRanking()
          Get boolean if initial ranking should be performed to select the top-ranked attributes
 java.lang.String getRevision()
          Returns the revision string.
 int getSeed()
          Seed for cross validation subset size determination.
 ASEvaluation getSubsetSizeEvaluator()
          Get the subset evaluator used for subset size determination.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 SelectedTag getType()
          Get the type
 boolean getVerbose()
          Get whether output is to be verbose
 java.lang.String globalInfo()
          Returns a string describing this search method
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 java.lang.String lookupCacheSizeTipText()
          Returns the tip text for this property
 java.lang.String numSubsetSizeCVFoldsTipText()
          Returns the tip text for this property
 java.lang.String numUsedAttributesTipText()
          Returns the tip text for this property
 java.lang.String performRankingTipText()
          Returns the tip text for this property
 int[] search(ASEvaluation ASEval, Instances data)
          Searches the attribute subset space by subset size forward selection
 java.lang.String seedTipText()
          Returns the tip text for this property
 void setLookupCacheSize(int size)
          Set the maximum size of the evaluated subset cache (hashtable).
 void setNumSubsetSizeCVFolds(int f)
          Set the number of cross validation folds for subset size determination (default = 5).
 void setNumUsedAttributes(int k)
          Set the number of top-ranked attributes that taken into account by the search process.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPerformRanking(boolean b)
          Perform initial ranking to select top-ranked attributes.
 void setSeed(int s)
          Seed for cross validation subset size determination.
 void setSubsetSizeEvaluator(ASEvaluation eval)
          Set the subset evaluator to use for subset size determination.
 void setType(SelectedTag t)
          Set the type
 void setVerbose(boolean b)
          Set whether verbose output should be generated.
 java.lang.String subsetSizeEvaluatorTipText()
          Returns the tip text for this property
 java.lang.String toString()
          returns a description of the search as a String
 java.lang.String typeTipText()
          Returns the tip text for this property
 java.lang.String verboseTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.attributeSelection.ASSearch
forName, makeCopies
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TAGS_TYPE

public static final Tag[] TAGS_TYPE
Constructor Detail

SubsetSizeForwardSelection

public SubsetSizeForwardSelection()
Constructor

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this search method

Returns:
a description of the search method suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Returns:
the technical information about this class

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-I
Perform initial ranking to select top-ranked attributes.

-K
Number of top-ranked attributes that are taken into account.

-T <0 = fixed-set | 1 = fixed-width>
Typ of Linear Forward Selection (default = 0).

-S
Size of lookup cache for evaluated subsets. Expressed as a multiple of the number of attributes in the data set. (default = 1).

-E
class name of subset evaluator to use for subset size determination (default = null, same subset evaluator as for ranking and final forward selection is used). Place any evaluator options LAST on the command line following a "--". eg. -A weka.attributeSelection.ClassifierSubsetEval ... -- -M -F
Number of cross validation folds for subset size determination (default = 5).

-R
Seed for cross validation subset size determination. (default = 1)

-Z
verbose on/off.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

setLookupCacheSize

public void setLookupCacheSize(int size)
Set the maximum size of the evaluated subset cache (hashtable). This is expressed as a multiplier for the number of attributes in the data set. (default = 1).

Parameters:
size - the maximum size of the hashtable

getLookupCacheSize

public int getLookupCacheSize()
Return the maximum size of the evaluated subset cache (expressed as a multiplier for the number of attributes in a data set.

Returns:
the maximum size of the hashtable.

lookupCacheSizeTipText

public java.lang.String lookupCacheSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

performRankingTipText

public java.lang.String performRankingTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPerformRanking

public void setPerformRanking(boolean b)
Perform initial ranking to select top-ranked attributes.

Parameters:
b - true if initial ranking should be performed

getPerformRanking

public boolean getPerformRanking()
Get boolean if initial ranking should be performed to select the top-ranked attributes

Returns:
true if initial ranking should be performed

numUsedAttributesTipText

public java.lang.String numUsedAttributesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumUsedAttributes

public void setNumUsedAttributes(int k)
                          throws java.lang.Exception
Set the number of top-ranked attributes that taken into account by the search process.

Parameters:
k - the number of attributes
Throws:
java.lang.Exception - if k is less than 2

getNumUsedAttributes

public int getNumUsedAttributes()
Get the number of top-ranked attributes that taken into account by the search process.

Returns:
the number of top-ranked attributes that taken into account

typeTipText

public java.lang.String typeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setType

public void setType(SelectedTag t)
Set the type

Parameters:
t - the Linear Forward Selection type

getType

public SelectedTag getType()
Get the type

Returns:
the Linear Forward Selection type

subsetSizeEvaluatorTipText

public java.lang.String subsetSizeEvaluatorTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSubsetSizeEvaluator

public void setSubsetSizeEvaluator(ASEvaluation eval)
                            throws java.lang.Exception
Set the subset evaluator to use for subset size determination.

Parameters:
eval - the subset evaluator to use for subset size determination.
Throws:
java.lang.Exception

getSubsetSizeEvaluator

public ASEvaluation getSubsetSizeEvaluator()
Get the subset evaluator used for subset size determination.

Returns:
the evaluator used for subset size determination.

numSubsetSizeCVFoldsTipText

public java.lang.String numSubsetSizeCVFoldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumSubsetSizeCVFolds

public void setNumSubsetSizeCVFolds(int f)
Set the number of cross validation folds for subset size determination (default = 5).

Parameters:
f - number of folds

getNumSubsetSizeCVFolds

public int getNumSubsetSizeCVFolds()
Get the number of cross validation folds for subset size determination (default = 5).

Returns:
number of folds

seedTipText

public java.lang.String seedTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setSeed

public void setSeed(int s)
Seed for cross validation subset size determination. (default = 1)

Parameters:
s - seed

getSeed

public int getSeed()
Seed for cross validation subset size determination. (default = 1)

Returns:
seed

verboseTipText

public java.lang.String verboseTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setVerbose

public void setVerbose(boolean b)
Set whether verbose output should be generated.

Parameters:
d - true if output is to be verbose.

getVerbose

public boolean getVerbose()
Get whether output is to be verbose

Returns:
true if output will be verbose

getOptions

public java.lang.String[] getOptions()
Gets the current settings of LinearForwardSelection.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
returns a description of the search as a String

Overrides:
toString in class java.lang.Object
Returns:
a description of the search

search

public int[] search(ASEvaluation ASEval,
                    Instances data)
             throws java.lang.Exception
Searches the attribute subset space by subset size forward selection

Specified by:
search in class ASSearch
Parameters:
ASEvaluator - the attribute evaluator to guide the search
data - the training instances.
Returns:
an array (not necessarily ordered) of selected attribute indexes
Throws:
java.lang.Exception - if the search can't be completed

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class ASSearch
Returns:
the revision