Class InterquartileRange
java.lang.Object
weka.filters.Filter
weka.filters.SimpleFilter
weka.filters.SimpleBatchFilter
weka.filters.unsupervised.attribute.InterquartileRange
- All Implemented Interfaces:
Serializable,CapabilitiesHandler,CapabilitiesIgnorer,CommandlineRunnable,OptionHandler,RevisionHandler,WeightedAttributesHandler
A filter for detecting outliers and extreme values
based on interquartile ranges. The filter skips the class attribute.
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor Valid options are:
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor Valid options are:
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M
Generates an additional attribute 'Offset' per Outlier/ExtremeValue
pair that contains the multiplier that the value is off the median.
value = median + 'multiplier' * IQR
Note: implicitely sets '-P'. (default: off)
Thanks to Dale for a few brainstorming sessions.- Version:
- $Revision: 15448 $
- Author:
- Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumenum for obtaining the various determined IQR values. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intindicator for non-numeric attributes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyGets the current range selectionReturns the Capabilities of this filter.booleanGets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").booleanGet whether extreme values are also tagged as outliers.doubleGets the factor for determining the thresholds for extreme values.String[]Gets the current settings of the filter.doubleGets the factor for determining the thresholds for outliers.booleanGets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.Returns the revision string.double[]Returns the values for the specified type.Returns a string describing this filterReturns an enumeration describing the available options.static voidMain method for testing this class.Returns the tip text for this propertyReturns the tip text for this propertyvoidsetAttributeIndices(String value) Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).voidsetAttributeIndicesArray(int[] value) Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).voidsetDetectionPerAttribute(boolean value) Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").voidsetExtremeValuesAsOutliers(boolean value) Set whether extreme values are also tagged as outliers.voidsetExtremeValuesFactor(double value) Sets the factor for determining the thresholds for extreme values.voidsetOptions(String[] options) Parses a list of options for this object.voidsetOutlierFactor(double value) Sets the factor for determining the thresholds for outliers.voidsetOutputOffsetMultiplier(boolean value) Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, input, inputMethods inherited from class weka.filters.SimpleFilter
setInputFormatMethods inherited from class weka.filters.Filter
batchFilterFile, debugTipText, doNotCheckCapabilitiesTipText, filterFile, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, postExecution, preExecution, run, runFilter, setDebug, setDoNotCheckCapabilities, toString, useFilter, wekaStaticWrapper
-
Field Details
-
NON_NUMERIC
public static final int NON_NUMERICindicator for non-numeric attributes- See Also:
-
-
Constructor Details
-
InterquartileRange
public InterquartileRange()
-
-
Method Details
-
globalInfo
Returns a string describing this filter- Specified by:
globalInfoin classSimpleFilter- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptionsin interfaceOptionHandler- Overrides:
listOptionsin classFilter- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a list of options for this object. Valid options are:-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)- Specified by:
setOptionsin interfaceOptionHandler- Overrides:
setOptionsin classFilter- Parameters:
options- the list of options as an array of strings- Throws:
Exception- if an option is not supported
-
getOptions
Gets the current settings of the filter.- Specified by:
getOptionsin interfaceOptionHandler- Overrides:
getOptionsin classFilter- Returns:
- an array of strings suitable for passing to setOptions
-
attributeIndicesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
Gets the current range selection- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
IllegalArgumentException- if an invalid range list is supplied
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] value) Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value- an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.- Throws:
IllegalArgumentException- if an invalid set of ranges is supplied
-
outlierFactorTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutlierFactor
public void setOutlierFactor(double value) Sets the factor for determining the thresholds for outliers.- Parameters:
value- the factor.
-
getOutlierFactor
public double getOutlierFactor()Gets the factor for determining the thresholds for outliers.- Returns:
- the factor.
-
extremeValuesFactorTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesFactor
public void setExtremeValuesFactor(double value) Sets the factor for determining the thresholds for extreme values.- Parameters:
value- the factor.
-
getExtremeValuesFactor
public double getExtremeValuesFactor()Gets the factor for determining the thresholds for extreme values.- Returns:
- the factor.
-
extremeValuesAsOutliersTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesAsOutliers
public void setExtremeValuesAsOutliers(boolean value) Set whether extreme values are also tagged as outliers.- Parameters:
value- whether or not to tag extreme values also as outliers.
-
getExtremeValuesAsOutliers
public boolean getExtremeValuesAsOutliers()Get whether extreme values are also tagged as outliers.- Returns:
- true if extreme values are also tagged as outliers.
-
detectionPerAttributeTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDetectionPerAttribute
public void setDetectionPerAttribute(boolean value) Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Parameters:
value- whether or not to generate indicator attribute pairs for each numeric attribute.
-
getDetectionPerAttribute
public boolean getDetectionPerAttribute()Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Returns:
- true if indicator attribute pairs are generated for each numeric attribute.
-
outputOffsetMultiplierTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutputOffsetMultiplier
public void setOutputOffsetMultiplier(boolean value) Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Parameters:
value- whether or not to generate the additional attribute.
-
getOutputOffsetMultiplier
public boolean getOutputOffsetMultiplier()Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Returns:
- true if the additional attribute is generated.
-
getCapabilities
Returns the Capabilities of this filter.- Specified by:
getCapabilitiesin interfaceCapabilitiesHandler- Overrides:
getCapabilitiesin classFilter- Returns:
- the capabilities of this object
- See Also:
-
getValues
Returns the values for the specified type.- Parameters:
type- the type of values to return- Returns:
- the values
-
getRevision
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Overrides:
getRevisionin classFilter- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
args- should contain arguments to the filter: use -h for help
-