Package edu.berkeley.nlp.lm
Class StupidBackoffLm<W>
- java.lang.Object
-
- edu.berkeley.nlp.lm.AbstractNgramLanguageModel<W>
-
- edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel<W>
-
- edu.berkeley.nlp.lm.StupidBackoffLm<W>
-
- Type Parameters:
W
-
- All Implemented Interfaces:
ArrayEncodedNgramLanguageModel<W>
,NgramLanguageModel<W>
,java.io.Serializable
public class StupidBackoffLm<W> extends AbstractArrayEncodedNgramLanguageModel<W> implements ArrayEncodedNgramLanguageModel<W>, java.io.Serializable
Language model implementation which uses stupid backoff (Brants et al., 2007) computation. Note that stupid backoff does not properly normalize, so the scores this LM computes are not in fact probabilities. Also, unliked LMs estimated usingLmReaders.createKneserNeyLmFromTextFiles
, this model returns natural logarithms instead of log10.- Author:
- adampauls
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.ArrayEncodedNgramLanguageModel
ArrayEncodedNgramLanguageModel.DefaultImplementations
-
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
NgramLanguageModel.StaticMethods
-
-
Field Summary
Fields Modifier and Type Field Description protected NgramMap<LongRef>
map
-
Fields inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
lmOrder, oovWordLogProb
-
-
Constructor Summary
Constructors Constructor Description StupidBackoffLm(int lmOrder, WordIndexer<W> wordIndexer, NgramMap<LongRef> map, ConfigOptions opts)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description float
getLogProb(int[] ngram)
Equivalent togetLogProb(ngram, 0, ngram.length)
float
getLogProb(int[] ngram, int startPos, int endPos)
Calculate language model score of an n-gram.float
getLogProb(java.util.List<W> ngram)
Scores an n-gram.NgramMap<LongRef>
getNgramMap()
long
getRawCount(int[] ngram, int startPos, int endPos)
Gets the raw count of an n-gram.-
Methods inherited from class edu.berkeley.nlp.lm.AbstractArrayEncodedNgramLanguageModel
scoreSentence
-
Methods inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
getLmOrder, getWordIndexer, setOovWordLogProb
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
getLmOrder, getWordIndexer, scoreSentence, setOovWordLogProb
-
-
-
-
Constructor Detail
-
StupidBackoffLm
public StupidBackoffLm(int lmOrder, WordIndexer<W> wordIndexer, NgramMap<LongRef> map, ConfigOptions opts)
-
-
Method Detail
-
getLogProb
public float getLogProb(int[] ngram, int startPos, int endPos)
Description copied from interface:ArrayEncodedNgramLanguageModel
Calculate language model score of an n-gram. Warning: if you pass in an n-gram of length greater thangetLmOrder()
, this call will silently ignore the extra words of context. In other words, if you pass in a 5-gram (endPos-startPos == 5
) to a 3-gram model, it will only score the words fromstartPos + 2
toendPos
.- Specified by:
getLogProb
in interfaceArrayEncodedNgramLanguageModel<W>
- Specified by:
getLogProb
in classAbstractArrayEncodedNgramLanguageModel<W>
- Parameters:
ngram
- array of words in integer representationstartPos
- start of the portion of the array to be readendPos
- end of the portion of the array to be read.- Returns:
-
getRawCount
public long getRawCount(int[] ngram, int startPos, int endPos)
Gets the raw count of an n-gram.- Parameters:
ngram
-startPos
-endPos
-- Returns:
- count of n-gram, or -1 if n-gram is not in the map.
-
getLogProb
public float getLogProb(int[] ngram)
Description copied from interface:ArrayEncodedNgramLanguageModel
Equivalent togetLogProb(ngram, 0, ngram.length)
- Specified by:
getLogProb
in interfaceArrayEncodedNgramLanguageModel<W>
- Overrides:
getLogProb
in classAbstractArrayEncodedNgramLanguageModel<W>
- See Also:
ArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
-
getLogProb
public float getLogProb(java.util.List<W> ngram)
Description copied from interface:NgramLanguageModel
Scores an n-gram. This is a convenience method and will generally be relatively inefficient. More efficient versions are available inArrayEncodedNgramLanguageModel.getLogProb(int[], int, int)
andContextEncodedNgramLanguageModel.getLogProb(long, int, int, edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel.LmContextInfo)
.- Specified by:
getLogProb
in interfaceNgramLanguageModel<W>
- Overrides:
getLogProb
in classAbstractArrayEncodedNgramLanguageModel<W>
-
-