Package edu.berkeley.nlp.lm
Class StringWordIndexer
- java.lang.Object
-
- edu.berkeley.nlp.lm.StringWordIndexer
-
- All Implemented Interfaces:
WordIndexer<java.lang.String>
,java.io.Serializable
public class StringWordIndexer extends java.lang.Object implements WordIndexer<java.lang.String>
Implementation of a WordIndexer in which words are represented as strings.- Author:
- adampauls
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.WordIndexer
WordIndexer.StaticMethods
-
-
Constructor Summary
Constructors Constructor Description StringWordIndexer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
getEndSymbol()
Returns the start symbol (usually something like </s>int
getIndexPossiblyUnk(java.lang.String word)
Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.int
getOrAddIndex(java.lang.String word)
Gets the index for a word, adding if necessary.int
getOrAddIndexFromString(java.lang.String word)
java.lang.String
getStartSymbol()
Returns the start symbol (usually something like <s>java.lang.String
getUnkSymbol()
Returns the unk symbol (usually something like <unk>java.lang.String
getWord(int index)
Gets the word object for an index.int
numWords()
Number of words that have been added so farvoid
setEndSymbol(java.lang.String sym)
void
setStartSymbol(java.lang.String sym)
void
setUnkSymbol(java.lang.String sym)
void
trimAndLock()
Informs the implementation that no more words can be added to the vocabulary.
-
-
-
Method Detail
-
getOrAddIndex
public int getOrAddIndex(java.lang.String word)
Description copied from interface:WordIndexer
Gets the index for a word, adding if necessary.- Specified by:
getOrAddIndex
in interfaceWordIndexer<java.lang.String>
- Returns:
-
getWord
public java.lang.String getWord(int index)
Description copied from interface:WordIndexer
Gets the word object for an index.- Specified by:
getWord
in interfaceWordIndexer<java.lang.String>
- Returns:
-
numWords
public int numWords()
Description copied from interface:WordIndexer
Number of words that have been added so far- Specified by:
numWords
in interfaceWordIndexer<java.lang.String>
- Returns:
-
getStartSymbol
public java.lang.String getStartSymbol()
Description copied from interface:WordIndexer
Returns the start symbol (usually something like <s>- Specified by:
getStartSymbol
in interfaceWordIndexer<java.lang.String>
- Returns:
-
getEndSymbol
public java.lang.String getEndSymbol()
Description copied from interface:WordIndexer
Returns the start symbol (usually something like </s>- Specified by:
getEndSymbol
in interfaceWordIndexer<java.lang.String>
- Returns:
-
getUnkSymbol
public java.lang.String getUnkSymbol()
Description copied from interface:WordIndexer
Returns the unk symbol (usually something like <unk>- Specified by:
getUnkSymbol
in interfaceWordIndexer<java.lang.String>
- Returns:
-
getOrAddIndexFromString
public int getOrAddIndexFromString(java.lang.String word)
- Specified by:
getOrAddIndexFromString
in interfaceWordIndexer<java.lang.String>
-
setStartSymbol
public void setStartSymbol(java.lang.String sym)
- Specified by:
setStartSymbol
in interfaceWordIndexer<java.lang.String>
-
setEndSymbol
public void setEndSymbol(java.lang.String sym)
- Specified by:
setEndSymbol
in interfaceWordIndexer<java.lang.String>
-
setUnkSymbol
public void setUnkSymbol(java.lang.String sym)
- Specified by:
setUnkSymbol
in interfaceWordIndexer<java.lang.String>
-
trimAndLock
public void trimAndLock()
Description copied from interface:WordIndexer
Informs the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.- Specified by:
trimAndLock
in interfaceWordIndexer<java.lang.String>
-
getIndexPossiblyUnk
public int getIndexPossiblyUnk(java.lang.String word)
Description copied from interface:WordIndexer
Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.- Specified by:
getIndexPossiblyUnk
in interfaceWordIndexer<java.lang.String>
- Returns:
-
-