Package weka.core.tokenizers
Class WordTokenizer
java.lang.Object
weka.core.tokenizers.Tokenizer
weka.core.tokenizers.CharacterDelimitedTokenizer
weka.core.tokenizers.WordTokenizer
- All Implemented Interfaces:
Serializable,Enumeration<String>,OptionHandler,RevisionHandler
A simple tokenizer that is using the
java.util.StringTokenizer class to tokenize the strings.
Valid options are:
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
- Version:
- $Revision: 10203 $
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the revision string.Returns a string describing the stemmerbooleanTests if this enumeration contains more elements.static voidRuns the tokenizer with the given options and strings to tokenize.Returns the next element of this enumeration if this enumeration object has at least one more element to provide.voidSets the string to tokenize.Methods inherited from class weka.core.tokenizers.CharacterDelimitedTokenizer
delimitersTipText, getDelimiters, getOptions, listOptions, setDelimiters, setOptionsMethods inherited from class weka.core.tokenizers.Tokenizer
runTokenizer, tokenizeMethods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.util.Enumeration
asIterator
-
Constructor Details
-
WordTokenizer
public WordTokenizer()
-
-
Method Details
-
globalInfo
Returns a string describing the stemmer- Specified by:
globalInfoin classTokenizer- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
hasMoreElements
public boolean hasMoreElements()Tests if this enumeration contains more elements.- Specified by:
hasMoreElementsin interfaceEnumeration<String>- Specified by:
hasMoreElementsin classTokenizer- Returns:
- true if and only if this enumeration object contains at least one more element to provide; false otherwise.
-
nextElement
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.- Specified by:
nextElementin interfaceEnumeration<String>- Specified by:
nextElementin classTokenizer- Returns:
- the next element of this enumeration.
-
tokenize
Sets the string to tokenize. Tokenization happens immediately. -
getRevision
Returns the revision string.- Returns:
- the revision
-
main
Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout.- Parameters:
args- the commandline options and strings to tokenize
-