Interface ContextEncodedNgramLanguageModel<W>

  • Type Parameters:
    W -
    All Superinterfaces:
    NgramLanguageModel<W>
    All Known Implementing Classes:
    AbstractContextEncodedNgramLanguageModel, ContextEncodedCachingLmWrapper, ContextEncodedProbBackoffLm

    public interface ContextEncodedNgramLanguageModel<W>
    extends NgramLanguageModel<W>
    Interface for language models which expose the internal context-encoding for more efficient queries. (Note: language model implementations may internally use a context-encoding without implementing this interface). A context-encoding encodes an n-gram as a integer representing the last word, and an offset which serves as a logical pointer to the (n-1) prefix words. The integers represent words of type W in the vocabulary, and the mapping from the vocabulary to integers is managed by an instance of the WordIndexer class.
    Author:
    adampauls
    • Method Detail

      • getLogProb

        float getLogProb​(long contextOffset,
                         int contextOrder,
                         int word,
                         ContextEncodedNgramLanguageModel.LmContextInfo outputContext)
        Get the score for an n-gram, and also get the context offset of the n-gram's suffix.
        Parameters:
        contextOffset - Offset of context (prefix) of an n-gram
        contextOrder - The (0-based) length of context (i.e. order == 0 iff context refers to a unigram).
        word - Last word of the n-gram
        outputContext - Offset of the suffix of the input n-gram. If the parameter is null it will be ignored. This can be passed to future queries for efficient access.
        Returns:
      • getOffsetForNgram

        ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram​(int[] ngram,
                                                                         int startPos,
                                                                         int endPos)
        Gets the offset which refers to an n-gram. If the n-gram is not in the model, then it returns the shortest suffix of the n-gram which is. This operation is not necessarily fast.
      • getNgramForOffset

        int[] getNgramForOffset​(long contextOffset,
                                int contextOrder,
                                int word)
        Gets the n-gram referred to by a context-encoding. This operation is not necessarily fast.