Package weka.classifiers.trees
Class RandomForest
- All Implemented Interfaces:
Serializable,Cloneable,Classifier,AdditionalMeasureProducer,Aggregateable<Bagging>,BatchPredictor,CapabilitiesHandler,CapabilitiesIgnorer,CommandlineRunnable,OptionHandler,PartitionGenerator,Randomizable,RevisionHandler,TechnicalInformationHandler,WeightedInstancesHandler
Class for constructing a forest of random trees.
For more information see:
Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.
BibTeX:
Valid options are:
For more information see:
Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.
BibTeX:
@article{Breiman2001,
author = {Leo Breiman},
journal = {Machine Learning},
number = {1},
pages = {5-32},
title = {Random Forests},
volume = {45},
year = {2001}
}
Valid options are:
-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-store-out-of-bag-predictions Whether to store out of bag predictions in internal evaluation object.
-output-out-of-bag-complexity-statistics Whether to output complexity-based statistics when out-of-bag evaluation is performed.
-print Print the individual classifiers in the output
-attribute-importance Compute and output attribute importance (mean impurity decrease method)
-I <num> Number of iterations (i.e., the number of trees in the random forest). (current value 100)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism) (use 0 to auto-detect number of cores)
-K <number of attributes> Number of attributes to randomly investigate. (default 0) (<1 = int(log_2(#predictors)+1)).
-M <minimum number of instances> Set minimum number of instances per leaf. (default 1)
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-S <num> Seed for random number generator. (default 1)
-depth <num> The maximum depth of the tree, 0 for unlimited. (default 0)
-N <num> Number of folds for backfitting (default 0, no backfitting).
-U Allow unclassified instances.
-B Break ties randomly when several attributes look equally good.
-output-debug-info If set, classifier is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
-num-decimal-places The number of decimal places for the output of numbers in the model (default 2).
-batch-size The desired batch size for batch prediction (default 100).
- Version:
- $Revision: 15312 $
- Author:
- Richard Kirkby (rkirkby@cs.waikato.ac.nz)
- See Also:
-
Field Summary
Fields inherited from class weka.classifiers.AbstractClassifier
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT -
Constructor Summary
ConstructorsConstructorDescriptionConstructor that sets base classifier for bagging to RandomTre and default number of iterations to 100. -
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyReturns the tip text for this propertydouble[]computeAverageImpurityDecreasePerAttribute(double[] nodeCounts) Computes the average impurity decrease per attribute over the treesbooleanGet whether to break ties randomly.Returns default capabilities of the base classifier.booleanGet whether to compute and output attribute importance scoresintGet the maximum depth of trh tree, 0 for unlimited.intGet the number of features used in random selection.String[]Gets the current settings of the forest.Returns the revision string.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Returns a string describing classifierReturns an enumeration describing the available options.static voidMain method for this class.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for the number of iterations.voidsetBatchSize(String size) Set the preferred batch size for batch prediction.voidsetBreakTiesRandomly(boolean newBreakTiesRandomly) Set whether to break ties randomly.voidsetClassifier(Classifier newClassifier) This method only accepts RandomTree arguments.voidsetComputeAttributeImportance(boolean computeAttributeImportance) Set whether to compute and output attribute importance scoresvoidsetDebug(boolean debug) Set debugging mode.voidsetMaxDepth(int value) Set the maximum depth of the tree, 0 for unlimited.voidsetNumDecimalPlaces(int num) Set the number of decimal places.voidsetNumFeatures(int newNumFeatures) Set the number of features to use in random selection.voidsetOptions(String[] options) Parses a given list of options.voidsetRepresentCopiesUsingWeights(boolean representUsingWeights) This method only accepts true as its argumentvoidsetSeed(int s) Sets the seed for the random number generator.toString()Returns description of the bagged classifier.Methods inherited from class weka.classifiers.meta.Bagging
aggregate, bagSizePercentTipText, batchSizeTipText, buildClassifier, calcOutOfBagTipText, distributionForInstance, distributionsForInstances, enumerateMeasures, finalizeAggregation, generatePartition, getBagSizePercent, getBatchSize, getCalcOutOfBag, getMeasure, getMembershipValues, getOutOfBagEvaluationObject, getOutputOutOfBagComplexityStatistics, getPrintClassifiers, getRepresentCopiesUsingWeights, getStoreOutOfBagPredictions, implementsMoreEfficientBatchPrediction, measureOutOfBagError, numElements, outputOutOfBagComplexityStatisticsTipText, printClassifiersTipText, representCopiesUsingWeightsTipText, setBagSizePercent, setCalcOutOfBag, setOutputOutOfBagComplexityStatistics, setPrintClassifiers, setStoreOutOfBagPredictions, storeOutOfBagPredictionsTipTextMethods inherited from class weka.classifiers.RandomizableParallelIteratedSingleClassifierEnhancer
getSeed, seedTipTextMethods inherited from class weka.classifiers.ParallelIteratedSingleClassifierEnhancer
getNumExecutionSlots, numExecutionSlotsTipText, setNumExecutionSlotsMethods inherited from class weka.classifiers.IteratedSingleClassifierEnhancer
getNumIterations, setNumIterationsMethods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, postExecution, preExecutionMethods inherited from class weka.classifiers.AbstractClassifier
classifyInstance, debugTipText, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, makeCopies, makeCopy, numDecimalPlacesTipText, run, runClassifier, setDoNotCheckCapabilities
-
Constructor Details
-
RandomForest
public RandomForest()Constructor that sets base classifier for bagging to RandomTre and default number of iterations to 100.
-
-
Method Details
-
getCapabilities
Returns default capabilities of the base classifier.- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Specified by:
getCapabilitiesin interfaceClassifier- Overrides:
getCapabilitiesin classSingleClassifierEnhancer- Returns:
- the capabilities of the base classifier
- See Also:
-
globalInfo
Returns a string describing classifier- Overrides:
globalInfoin classBagging- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformationin interfaceTechnicalInformationHandler- Overrides:
getTechnicalInformationin classBagging- Returns:
- the technical information about this class
-
numIterationsTipText
Returns the tip text for the number of iterations. Overridden here to be more informative.- Overrides:
numIterationsTipTextin classIteratedSingleClassifierEnhancer- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setClassifier
This method only accepts RandomTree arguments.- Overrides:
setClassifierin classSingleClassifierEnhancer- Parameters:
newClassifier- the RandomTree to use.
-
setRepresentCopiesUsingWeights
This method only accepts true as its argument- Overrides:
setRepresentCopiesUsingWeightsin classBagging- Parameters:
representUsingWeights- must be set to true.
-
numFeaturesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumFeatures
public int getNumFeatures()Get the number of features used in random selection.- Returns:
- Value of numFeatures.
-
setNumFeatures
public void setNumFeatures(int newNumFeatures) Set the number of features to use in random selection.- Parameters:
newNumFeatures- Value to assign to numFeatures.
-
computeAttributeImportanceTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setComputeAttributeImportance
public void setComputeAttributeImportance(boolean computeAttributeImportance) Set whether to compute and output attribute importance scores- Parameters:
computeAttributeImportance- true to compute attribute importance scores
-
getComputeAttributeImportance
public boolean getComputeAttributeImportance()Get whether to compute and output attribute importance scores- Returns:
- true if computing attribute importance scores
-
maxDepthTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxDepth
public int getMaxDepth()Get the maximum depth of trh tree, 0 for unlimited.- Returns:
- the maximum depth.
-
setMaxDepth
public void setMaxDepth(int value) Set the maximum depth of the tree, 0 for unlimited.- Parameters:
value- the maximum depth.
-
breakTiesRandomlyTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getBreakTiesRandomly
public boolean getBreakTiesRandomly()Get whether to break ties randomly.- Returns:
- true if ties are to be broken randomly.
-
setBreakTiesRandomly
public void setBreakTiesRandomly(boolean newBreakTiesRandomly) Set whether to break ties randomly.- Parameters:
newBreakTiesRandomly- true if ties are to be broken randomly
-
setDebug
public void setDebug(boolean debug) Set debugging mode.- Overrides:
setDebugin classAbstractClassifier- Parameters:
debug- true if debug output should be printed
-
setNumDecimalPlaces
public void setNumDecimalPlaces(int num) Set the number of decimal places.- Overrides:
setNumDecimalPlacesin classAbstractClassifier
-
setBatchSize
Set the preferred batch size for batch prediction.- Specified by:
setBatchSizein interfaceBatchPredictor- Overrides:
setBatchSizein classBagging- Parameters:
size- the batch size to use
-
setSeed
public void setSeed(int s) Sets the seed for the random number generator.- Specified by:
setSeedin interfaceRandomizable- Overrides:
setSeedin classRandomizableParallelIteratedSingleClassifierEnhancer- Parameters:
s- the seed to be used
-
toString
Returns description of the bagged classifier. -
computeAverageImpurityDecreasePerAttribute
public double[] computeAverageImpurityDecreasePerAttribute(double[] nodeCounts) throws WekaException Computes the average impurity decrease per attribute over the trees- Parameters:
nodeCounts- an optional array that, if non-null, will hold the count of the number of nodes at which each attribute was used for splitting- Returns:
- the average impurity decrease per attribute over the trees
- Throws:
WekaException
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classBagging- Returns:
- an enumeration of all the available options
-
getOptions
Gets the current settings of the forest.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classBagging- Returns:
- an array of strings suitable for passing to setOptions()
-
setOptions
Parses a given list of options. Valid options are:-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-store-out-of-bag-predictions Whether to store out of bag predictions in internal evaluation object.
-output-out-of-bag-complexity-statistics Whether to output complexity-based statistics when out-of-bag evaluation is performed.
-print Print the individual classifiers in the output
-attribute-importance Compute and output attribute importance (mean impurity decrease method)
-I <num> Number of iterations (i.e., the number of trees in the random forest). (current value 100)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism) (use 0 to auto-detect number of cores)
-K <number of attributes> Number of attributes to randomly investigate. (default 0) (<1 = int(log_2(#predictors)+1)).
-M <minimum number of instances> Set minimum number of instances per leaf. (default 1)
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-S <num> Seed for random number generator. (default 1)
-depth <num> The maximum depth of the tree, 0 for unlimited. (default 0)
-N <num> Number of folds for backfitting (default 0, no backfitting).
-U Allow unclassified instances.
-B Break ties randomly when several attributes look equally good.
-output-debug-info If set, classifier is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
-num-decimal-places The number of decimal places for the output of numbers in the model (default 2).
-batch-size The desired batch size for batch prediction (default 100).
- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classBagging- Parameters:
options- the list of options as an array of strings- Throws:
Exception- if an option is not supported
-
getRevision
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classBagging- Returns:
- the revision
-
main
Main method for this class.- Parameters:
argv- the options
-