|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.datagenerators.DataGenerator
weka.datagenerators.ClusterGenerator
weka.datagenerators.clusterers.BIRCHCluster
public class BIRCHCluster
Cluster data generator designed for the BIRCH System
Dataset is generated with instances in K clusters.
Instances are 2-d data points.
Each cluster is characterized by the number of data points in itits radius and its center. The location of the cluster centers isdetermined by the pattern parameter. Three patterns are currentlysupported grid, sine and random.
For more information refer to:
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: ACM SIGMOD International Conference on Management of Data, 103-114, 1996.
@inproceedings{Zhang1996, author = {Tian Zhang and Raghu Ramakrishnan and Miron Livny}, booktitle = {ACM SIGMOD International Conference on Management of Data}, pages = {103-114}, publisher = {ACM Press}, title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases}, year = {1996} }Valid options are:
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
Field Summary | |
---|---|
static int |
GRID
Constant set for choice of pattern. |
static int |
ORDERED
Constant set for input order (option O) |
static int |
RANDOM
Constant set for choice of pattern. |
static int |
RANDOMIZED
Constant set for input order (default) |
static int |
SINE
Constant set for choice of pattern. |
static Tag[] |
TAGS_INPUTORDER
the input order tags |
static Tag[] |
TAGS_PATTERN
the pattern tags |
Constructor Summary | |
---|---|
BIRCHCluster()
initializes the generator with default values |
Method Summary | |
---|---|
Instances |
defineDataFormat()
Initializes the format for the dataset produced. |
java.lang.String |
distMultTipText()
Returns the tip text for this property |
Instance |
generateExample()
Generate an example of the dataset. |
Instances |
generateExamples()
Generate all examples of the dataset. |
Instances |
generateExamples(java.util.Random random,
Instances format)
Generate all examples of the dataset. |
java.lang.String |
generateFinished()
Compiles documentation about the data generation after the generation process |
java.lang.String |
generateStart()
Compiles documentation about the data generation before the generation process |
double |
getDistMult()
Gets the distance multiplier. |
SelectedTag |
getInputOrder()
Gets the input order. |
int |
getMaxInstNum()
Gets the upper boundary for instances per cluster. |
double |
getMaxRadius()
Gets the upper boundary for the radiuses of the clusters. |
int |
getMinInstNum()
Gets the lower boundary for instances per cluster. |
double |
getMinRadius()
Gets the lower boundary for the radiuses of the clusters. |
double |
getNoiseRate()
Gets the percentage of noise set. |
int |
getNumClusters()
Gets the number of clusters the dataset should have. |
int |
getNumCycles()
Gets the number of cycles. |
java.lang.String[] |
getOptions()
Gets the current settings of the datagenerator BIRCHCluster. |
boolean |
getOrderedFlag()
Gets the ordered flag (option O). |
SelectedTag |
getPattern()
Gets the pattern type. |
java.lang.String |
getRevision()
Returns the revision string. |
boolean |
getSingleModeFlag()
Gets the single mode flag. |
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on. |
java.lang.String |
globalInfo()
Returns a string describing this data generator. |
java.lang.String |
inputOrderTipText()
Returns the tip text for this property |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] args)
Main method for testing this class. |
java.lang.String |
maxInstNumTipText()
Returns the tip text for this property |
java.lang.String |
maxRadiusTipText()
Returns the tip text for this property |
java.lang.String |
minInstNumTipText()
Returns the tip text for this property |
java.lang.String |
minRadiusTipText()
Returns the tip text for this property |
java.lang.String |
noiseRateTipText()
Returns the tip text for this property |
java.lang.String |
numClustersTipText()
Returns the tip text for this property |
java.lang.String |
numCyclesTipText()
Returns the tip text for this property |
java.lang.String |
patternTipText()
Returns the tip text for this property |
void |
setDistMult(double newDistMult)
Sets the distance multiplier. |
void |
setInputOrder(SelectedTag value)
Sets the input order. |
void |
setMaxInstNum(int newMaxInstNum)
Sets the upper boundary for instances per cluster. |
void |
setMaxRadius(double newMaxRadius)
Sets the upper boundary for the radiuses of the clusters. |
void |
setMinInstNum(int newMinInstNum)
Sets the lower boundary for instances per cluster. |
void |
setMinRadius(double newMinRadius)
Sets the lower boundary for the radiuses of the clusters. |
void |
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set. |
void |
setNumClusters(int numClusters)
Sets the number of clusters the dataset should have. |
void |
setNumCycles(int newNumCycles)
Sets the the number of cycles. |
void |
setOptions(java.lang.String[] options)
Parses a list of options for this object. |
void |
setPattern(SelectedTag value)
Sets the pattern type. |
Methods inherited from class weka.datagenerators.ClusterGenerator |
---|
booleanColsTipText, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, numAttributesTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndices, setNumAttributes |
Methods inherited from class weka.datagenerators.DataGenerator |
---|
debugTipText, defaultOutput, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int GRID
public static final int SINE
public static final int RANDOM
public static final Tag[] TAGS_PATTERN
public static final int ORDERED
public static final int RANDOMIZED
public static final Tag[] TAGS_INPUTORDER
Constructor Detail |
---|
public BIRCHCluster()
Method Detail |
---|
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class ClusterGenerator
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
setOptions
in interface OptionHandler
setOptions
in class ClusterGenerator
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class ClusterGenerator
DataGenerator.removeBlacklist(String[])
public void setNumClusters(int numClusters)
numClusters
- the new number of clusterspublic int getNumClusters()
public java.lang.String numClustersTipText()
public int getMinInstNum()
public void setMinInstNum(int newMinInstNum)
newMinInstNum
- new lower boundary for instances per clusterpublic java.lang.String minInstNumTipText()
public int getMaxInstNum()
public void setMaxInstNum(int newMaxInstNum)
newMaxInstNum
- new upper boundary for instances per clusterpublic java.lang.String maxInstNumTipText()
public double getMinRadius()
public void setMinRadius(double newMinRadius)
newMinRadius
- new lower boundary for the radiuses of the clusterspublic java.lang.String minRadiusTipText()
public double getMaxRadius()
public void setMaxRadius(double newMaxRadius)
newMaxRadius
- new upper boundary for the radiuses of the clusterspublic java.lang.String maxRadiusTipText()
public SelectedTag getPattern()
public void setPattern(SelectedTag value)
value
- new pattern typepublic java.lang.String patternTipText()
public double getDistMult()
public void setDistMult(double newDistMult)
newDistMult
- new distance multiplierpublic java.lang.String distMultTipText()
public int getNumCycles()
public void setNumCycles(int newNumCycles)
newNumCycles
- new number of cyclespublic java.lang.String numCyclesTipText()
public SelectedTag getInputOrder()
public void setInputOrder(SelectedTag value)
value
- new input orderpublic java.lang.String inputOrderTipText()
public boolean getOrderedFlag()
public double getNoiseRate()
public void setNoiseRate(double newNoiseRate)
newNoiseRate
- new percentage of noisepublic java.lang.String noiseRateTipText()
public boolean getSingleModeFlag()
getSingleModeFlag
in class DataGenerator
public Instances defineDataFormat() throws java.lang.Exception
defineDataFormat
in class DataGenerator
java.lang.Exception
- data format could not be definedDataGenerator.defaultRelationName()
public Instance generateExample() throws java.lang.Exception
generateExample
in class DataGenerator
java.lang.Exception
- if format not defined or generating public Instances generateExamples() throws java.lang.Exception
generateExamples
in class DataGenerator
java.lang.Exception
- if format not definedpublic Instances generateExamples(java.util.Random random, Instances format) throws java.lang.Exception
random
- the random number generator to useformat
- the dataset format
java.lang.Exception
- if format not definedpublic java.lang.String generateFinished() throws java.lang.Exception
generateFinished
in class DataGenerator
java.lang.Exception
- no input structure has been definedpublic java.lang.String generateStart()
generateStart
in class DataGenerator
public java.lang.String getRevision()
getRevision
in interface RevisionHandler
public static void main(java.lang.String[] args)
args
- should contain arguments for the data producer:
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |