Package weka.classifiers.evaluation
Class ThresholdCurve
java.lang.Object
weka.classifiers.evaluation.ThresholdCurve
- All Implemented Interfaces:
RevisionHandler
Generates points illustrating prediction tradeoffs that can be obtained by
varying the threshold value between classes. For example, the typical
threshold value of 0.5 means the predicted probability of "positive" must be
higher than 0.5 for the instance to be predicted as "positive". The resulting
dataset can be used to visualize precision/recall tradeoff, or for ROC curve
analysis (true positive rate vs false positive rate). Weka just varies the
threshold on the class probability estimates in each case. The Mann Whitney
statistic is used to calculate the AUC.
- Version:
- $Revision: 15923 $
- Author:
- Len Trigg (len@reeltwo.com)
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringattribute name: Falloutstatic final Stringattribute name: False Negativesstatic final Stringattribute name: False Positivesstatic final Stringattribute name: FMeasurestatic final Stringattribute name: False Positive Rate"static final Stringattribute name: Liftstatic final Stringattribute name: Precisionstatic final Stringattribute name: Recallstatic final StringThe name of the relation used in threshold curve datasetsstatic final Stringattribute name: Sample Sizestatic final Stringattribute name: Thresholdstatic final Stringattribute name: True Positive Ratestatic final Stringattribute name: True Negativesstatic final Stringattribute name: True Positives -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiongetCurve(ArrayList<Prediction> predictions) Calculates the performance stats for the default class and return results as a set of Instances.getCurve(ArrayList<Prediction> predictions, int classIndex) Calculates the performance stats for the desired class and return results as a set of Instances.static doublegetNPointPrecision(Instances tcurve, int n) Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.static doublegetPRCArea(Instances tcurve) Calculates the area under the precision-recall curve (AUPRC).Returns the revision string.static doublegetROCArea(Instances tcurve) Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.static intgetThresholdInstance(Instances tcurve, double threshold) Gets the index of the instance with the closest threshold value to the desired target
-
Field Details
-
RELATION_NAME
The name of the relation used in threshold curve datasets- See Also:
-
TRUE_POS_NAME
attribute name: True Positives- See Also:
-
FALSE_NEG_NAME
attribute name: False Negatives- See Also:
-
FALSE_POS_NAME
attribute name: False Positives- See Also:
-
TRUE_NEG_NAME
attribute name: True Negatives- See Also:
-
FP_RATE_NAME
attribute name: False Positive Rate"- See Also:
-
TP_RATE_NAME
attribute name: True Positive Rate- See Also:
-
PRECISION_NAME
attribute name: Precision- See Also:
-
RECALL_NAME
attribute name: Recall- See Also:
-
FALLOUT_NAME
attribute name: Fallout- See Also:
-
FMEASURE_NAME
attribute name: FMeasure- See Also:
-
SAMPLE_SIZE_NAME
attribute name: Sample Size- See Also:
-
LIFT_NAME
attribute name: Lift- See Also:
-
THRESHOLD_NAME
attribute name: Threshold- See Also:
-
-
Constructor Details
-
ThresholdCurve
public ThresholdCurve()
-
-
Method Details
-
getCurve
Calculates the performance stats for the default class and return results as a set of Instances. The structure of these Instances is as follows:- True Positives
- False Negatives
- False Positives
- True Negatives
- False Positive Rate
- True Positive Rate
- Precision
- Recall
- Fallout
- Threshold contains the probability threshold that gives rise to the previous performance values.
For the definitions of these measures, see TwoClassStats
- Parameters:
predictions- the predictions to base the curve on- Returns:
- datapoints as a set of instances, null if no predictions have been made.
- See Also:
-
getCurve
Calculates the performance stats for the desired class and return results as a set of Instances.- Parameters:
predictions- the predictions to base the curve onclassIndex- index of the class of interest.- Returns:
- datapoints as a set of instances.
-
getNPointPrecision
Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.- Parameters:
tcurve- a previously extracted threshold curve Instances.n- the number of points to average over.- Returns:
- the n-point precision.
-
getPRCArea
Calculates the area under the precision-recall curve (AUPRC).- Parameters:
tcurve- a previously extracted threshold curve Instances.- Returns:
- the PRC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
-
getROCArea
Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.- Parameters:
tcurve- a previously extracted threshold curve Instances.- Returns:
- the ROC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
-
getThresholdInstance
Gets the index of the instance with the closest threshold value to the desired target- Parameters:
tcurve- a set of instances that have been generated by this classthreshold- the target threshold- Returns:
- the index of the instance that has threshold closest to the target, or -1 if this could not be found (i.e. no data, or bad threshold target)
-
getRevision
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Returns:
- the revision
-