Class StringWordIndexer

  • All Implemented Interfaces:
    WordIndexer<java.lang.String>, java.io.Serializable

    public class StringWordIndexer
    extends java.lang.Object
    implements WordIndexer<java.lang.String>
    Implementation of a WordIndexer in which words are represented as strings.
    Author:
    adampauls
    See Also:
    Serialized Form
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String getEndSymbol()
      Returns the start symbol (usually something like </s>
      int getIndexPossiblyUnk​(java.lang.String word)
      Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.
      int getOrAddIndex​(java.lang.String word)
      Gets the index for a word, adding if necessary.
      int getOrAddIndexFromString​(java.lang.String word)  
      java.lang.String getStartSymbol()
      Returns the start symbol (usually something like <s>
      java.lang.String getUnkSymbol()
      Returns the unk symbol (usually something like <unk>
      java.lang.String getWord​(int index)
      Gets the word object for an index.
      int numWords()
      Number of words that have been added so far
      void setEndSymbol​(java.lang.String sym)  
      void setStartSymbol​(java.lang.String sym)  
      void setUnkSymbol​(java.lang.String sym)  
      void trimAndLock()
      Informs the implementation that no more words can be added to the vocabulary.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StringWordIndexer

        public StringWordIndexer()
    • Method Detail

      • getOrAddIndex

        public int getOrAddIndex​(java.lang.String word)
        Description copied from interface: WordIndexer
        Gets the index for a word, adding if necessary.
        Specified by:
        getOrAddIndex in interface WordIndexer<java.lang.String>
        Returns:
      • getWord

        public java.lang.String getWord​(int index)
        Description copied from interface: WordIndexer
        Gets the word object for an index.
        Specified by:
        getWord in interface WordIndexer<java.lang.String>
        Returns:
      • numWords

        public int numWords()
        Description copied from interface: WordIndexer
        Number of words that have been added so far
        Specified by:
        numWords in interface WordIndexer<java.lang.String>
        Returns:
      • getStartSymbol

        public java.lang.String getStartSymbol()
        Description copied from interface: WordIndexer
        Returns the start symbol (usually something like <s>
        Specified by:
        getStartSymbol in interface WordIndexer<java.lang.String>
        Returns:
      • getEndSymbol

        public java.lang.String getEndSymbol()
        Description copied from interface: WordIndexer
        Returns the start symbol (usually something like </s>
        Specified by:
        getEndSymbol in interface WordIndexer<java.lang.String>
        Returns:
      • getUnkSymbol

        public java.lang.String getUnkSymbol()
        Description copied from interface: WordIndexer
        Returns the unk symbol (usually something like <unk>
        Specified by:
        getUnkSymbol in interface WordIndexer<java.lang.String>
        Returns:
      • getOrAddIndexFromString

        public int getOrAddIndexFromString​(java.lang.String word)
        Specified by:
        getOrAddIndexFromString in interface WordIndexer<java.lang.String>
      • setStartSymbol

        public void setStartSymbol​(java.lang.String sym)
        Specified by:
        setStartSymbol in interface WordIndexer<java.lang.String>
      • setEndSymbol

        public void setEndSymbol​(java.lang.String sym)
        Specified by:
        setEndSymbol in interface WordIndexer<java.lang.String>
      • setUnkSymbol

        public void setUnkSymbol​(java.lang.String sym)
        Specified by:
        setUnkSymbol in interface WordIndexer<java.lang.String>
      • trimAndLock

        public void trimAndLock()
        Description copied from interface: WordIndexer
        Informs the implementation that no more words can be added to the vocabulary. Implementations may perform some space optimization, and should trigger an error if an attempt is made to add a word after this point.
        Specified by:
        trimAndLock in interface WordIndexer<java.lang.String>
      • getIndexPossiblyUnk

        public int getIndexPossiblyUnk​(java.lang.String word)
        Description copied from interface: WordIndexer
        Should never add to vocabulary, and should return getUnkSymbol() if the word is not in the vocabulary.
        Specified by:
        getIndexPossiblyUnk in interface WordIndexer<java.lang.String>
        Returns: