Package weka.datagenerators.clusterers
Class BIRCHCluster
java.lang.Object
weka.datagenerators.DataGenerator
weka.datagenerators.ClusterGenerator
weka.datagenerators.clusterers.BIRCHCluster
- All Implemented Interfaces:
Serializable,OptionHandler,Randomizable,RevisionHandler,TechnicalInformationHandler
Cluster data generator designed for the BIRCH
System
Dataset is generated with instances in K clusters.
Instances are 2-d data points.
Each cluster is characterized by the number of data points in itits radius and its center. The location of the cluster centers isdetermined by the pattern parameter. Three patterns are currentlysupported grid, sine and random.
For more information refer to:
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: ACM SIGMOD International Conference on Management of Data, 103-114, 1996. BibTeX:
Dataset is generated with instances in K clusters.
Instances are 2-d data points.
Each cluster is characterized by the number of data points in itits radius and its center. The location of the cluster centers isdetermined by the pattern parameter. Three patterns are currentlysupported grid, sine and random.
For more information refer to:
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: ACM SIGMOD International Conference on Management of Data, 103-114, 1996. BibTeX:
@inproceedings{Zhang1996,
author = {Tian Zhang and Raghu Ramakrishnan and Miron Livny},
booktitle = {ACM SIGMOD International Conference on Management of Data},
pages = {103-114},
publisher = {ACM Press},
title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
year = {1996}
}
Valid options are:
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
- Version:
- $Revision: 15707 $
- Author:
- Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intConstant set for choice of pattern.static final intConstant set for input order (option O)static final intConstant set for choice of pattern.static final intConstant set for input order (default)static final intConstant set for choice of pattern.static final Tag[]the input order tagsstatic final Tag[]the pattern tags -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionInitializes the format for the dataset produced.Returns the tip text for this propertyGenerate an example of the dataset.Generate all examples of the dataset.generateExamples(Random random, Instances format) Generate all examples of the dataset.Compiles documentation about the data generation after the generation processCompiles documentation about the data generation before the generation processdoubleGets the distance multiplier.Gets the input order.intGets the upper boundary for instances per cluster.doubleGets the upper boundary for the radiuses of the clusters.intGets the lower boundary for instances per cluster.doubleGets the lower boundary for the radiuses of the clusters.intGets the number of clusters the dataset should have.intGets the number of cycles.String[]Gets the current settings of the datagenerator BIRCHCluster.booleanGets the ordered flag (option O).Gets the pattern type.Returns the revision string.booleanGets the single mode flag.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Returns a string describing this data generator.Returns the tip text for this propertyReturns an enumeration describing the available options.static voidMain method for testing this class.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyvoidsetDistMult(double newDistMult) Sets the distance multiplier.voidsetInputOrder(SelectedTag value) Sets the input order.voidsetMaxInstNum(int newMaxInstNum) Sets the upper boundary for instances per cluster.voidsetMaxRadius(double newMaxRadius) Sets the upper boundary for the radiuses of the clusters.voidsetMinInstNum(int newMinInstNum) Sets the lower boundary for instances per cluster.voidsetMinRadius(double newMinRadius) Sets the lower boundary for the radiuses of the clusters.voidsetNumClusters(int numClusters) Sets the number of clusters the dataset should have.voidsetNumCycles(int newNumCycles) Sets the the number of cycles.voidsetOptions(String[] options) Parses a list of options for this object.voidsetPattern(SelectedTag value) Sets the pattern type.Methods inherited from class weka.datagenerators.ClusterGenerator
classFlagTipText, getClassFlag, getNumAttributes, numAttributesTipText, setClassFlag, setNumAttributesMethods inherited from class weka.datagenerators.DataGenerator
debugTipText, defaultOutput, enumToVector, formatTipText, getDatasetFormat, getDebug, getEpilogue, getNumExamplesAct, getOutput, getPrologue, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, runDataGenerator, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed
-
Field Details
-
GRID
public static final int GRIDConstant set for choice of pattern. (option G)- See Also:
-
SINE
public static final int SINEConstant set for choice of pattern. (option I)- See Also:
-
RANDOM
public static final int RANDOMConstant set for choice of pattern. (default)- See Also:
-
TAGS_PATTERN
the pattern tags -
ORDERED
public static final int ORDEREDConstant set for input order (option O)- See Also:
-
RANDOMIZED
public static final int RANDOMIZEDConstant set for input order (default)- See Also:
-
TAGS_INPUTORDER
the input order tags
-
-
Constructor Details
-
BIRCHCluster
public BIRCHCluster()initializes the generator with default values
-
-
Method Details
-
globalInfo
Returns a string describing this data generator.- Returns:
- a description of the data generator suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformationin interfaceTechnicalInformationHandler- Returns:
- the technical information about this class
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classClusterGenerator- Returns:
- an enumeration of all the available options
-
setOptions
Parses a list of options for this object. Valid options are:-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 10).
-c Class Flag, if set, the cluster is listed in extra attribute.
-k <num> The number of clusters (default 4)
-G Set pattern to grid (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-I Set pattern to sine (default is random). This flag cannot be used at the same time as flag I. The pattern is random, if neither flag G nor flag I is set.
-N <num>..<num> The range of number of instances per cluster (default 1..50). Lower number must be between 0 and 2500, upper number must be between 50 and 2500.
-R <num>..<num> The range of radius per cluster (default 0.1..1.4142135623730951). Lower number must be between 0 and SQRT(2), upper number must be between SQRT(2) and SQRT(32).
-M <num> The distance multiplier (default 4.0).
-C <num> The number of cycles (default 4).
-O Flag for input order is ORDERED. If flag is not set then input order is RANDOMIZED. RANDOMIZED is currently not implemented, therefore is the input order always ORDERED.
- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classClusterGenerator- Parameters:
options- the list of options as an array of strings- Throws:
Exception- if an option is not supported
-
getOptions
Gets the current settings of the datagenerator BIRCHCluster.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classClusterGenerator- Returns:
- an array of strings suitable for passing to setOptions
- See Also:
-
DataGenerator.removeBlacklist(String[])
-
setNumClusters
public void setNumClusters(int numClusters) Sets the number of clusters the dataset should have.- Parameters:
numClusters- the new number of clusters
-
getNumClusters
public int getNumClusters()Gets the number of clusters the dataset should have.- Returns:
- the number of clusters the dataset should have
-
numClustersTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinInstNum
public int getMinInstNum()Gets the lower boundary for instances per cluster.- Returns:
- the the lower boundary for instances per cluster
-
setMinInstNum
public void setMinInstNum(int newMinInstNum) Sets the lower boundary for instances per cluster.- Parameters:
newMinInstNum- new lower boundary for instances per cluster
-
minInstNumTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxInstNum
public int getMaxInstNum()Gets the upper boundary for instances per cluster.- Returns:
- the upper boundary for instances per cluster
-
setMaxInstNum
public void setMaxInstNum(int newMaxInstNum) Sets the upper boundary for instances per cluster.- Parameters:
newMaxInstNum- new upper boundary for instances per cluster
-
maxInstNumTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinRadius
public double getMinRadius()Gets the lower boundary for the radiuses of the clusters.- Returns:
- the lower boundary for the radiuses of the clusters
-
setMinRadius
public void setMinRadius(double newMinRadius) Sets the lower boundary for the radiuses of the clusters.- Parameters:
newMinRadius- new lower boundary for the radiuses of the clusters
-
minRadiusTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxRadius
public double getMaxRadius()Gets the upper boundary for the radiuses of the clusters.- Returns:
- the upper boundary for the radiuses of the clusters
-
setMaxRadius
public void setMaxRadius(double newMaxRadius) Sets the upper boundary for the radiuses of the clusters.- Parameters:
newMaxRadius- new upper boundary for the radiuses of the clusters
-
maxRadiusTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getPattern
Gets the pattern type.- Returns:
- the current pattern type
-
setPattern
Sets the pattern type.- Parameters:
value- new pattern type
-
patternTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDistMult
public double getDistMult()Gets the distance multiplier.- Returns:
- the distance multiplier
-
setDistMult
public void setDistMult(double newDistMult) Sets the distance multiplier.- Parameters:
newDistMult- new distance multiplier
-
distMultTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumCycles
public int getNumCycles()Gets the number of cycles.- Returns:
- the number of cycles
-
setNumCycles
public void setNumCycles(int newNumCycles) Sets the the number of cycles.- Parameters:
newNumCycles- new number of cycles
-
numCyclesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getInputOrder
Gets the input order.- Returns:
- the current input order
-
setInputOrder
Sets the input order.- Parameters:
value- new input order
-
inputOrderTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getOrderedFlag
public boolean getOrderedFlag()Gets the ordered flag (option O).- Returns:
- true if ordered flag is set
-
getSingleModeFlag
public boolean getSingleModeFlag()Gets the single mode flag.- Specified by:
getSingleModeFlagin classDataGenerator- Returns:
- true if methode generateExample can be used.
-
defineDataFormat
Initializes the format for the dataset produced.- Overrides:
defineDataFormatin classDataGenerator- Returns:
- the output data format
- Throws:
Exception- data format could not be defined- See Also:
-
DataGenerator.defaultRelationName()
-
generateExample
Generate an example of the dataset.- Specified by:
generateExamplein classDataGenerator- Returns:
- the instance generated
- Throws:
Exception- if format not defined or generating
examples one by one is not possible, because voting is chosen
-
generateExamples
Generate all examples of the dataset.- Specified by:
generateExamplesin classDataGenerator- Returns:
- the instance generated
- Throws:
Exception- if format not defined
-
generateExamples
Generate all examples of the dataset.- Parameters:
random- the random number generator to useformat- the dataset format- Returns:
- the instance generated
- Throws:
Exception- if format not defined
-
generateFinished
Compiles documentation about the data generation after the generation process- Specified by:
generateFinishedin classDataGenerator- Returns:
- string with additional information about generated dataset
- Throws:
Exception- no input structure has been defined
-
generateStart
Compiles documentation about the data generation before the generation process- Specified by:
generateStartin classDataGenerator- Returns:
- string with additional information
-
getRevision
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
args- should contain arguments for the data producer:
-