Package edu.berkeley.nlp.lm.cache
Class ContextEncodedCachingLmWrapper<T>
java.lang.Object
edu.berkeley.nlp.lm.AbstractNgramLanguageModel<T>
edu.berkeley.nlp.lm.AbstractContextEncodedNgramLanguageModel<T>
edu.berkeley.nlp.lm.cache.ContextEncodedCachingLmWrapper<T>
- Type Parameters:
W
-
- All Implemented Interfaces:
ContextEncodedNgramLanguageModel<T>
,NgramLanguageModel<T>
,Serializable
This class wraps
ContextEncodedNgramLanguageModel
with a cache.- Author:
- adampauls
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.ContextEncodedNgramLanguageModel
ContextEncodedNgramLanguageModel.DefaultImplementations, ContextEncodedNgramLanguageModel.LmContextInfo
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
NgramLanguageModel.StaticMethods
-
Field Summary
Fields inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
lmOrder, oovWordLogProb
-
Method Summary
Modifier and TypeMethodDescriptionfloat
getLogProb
(long contextOffset, int contextOrder, int word, ContextEncodedNgramLanguageModel.LmContextInfo contextOutput) Get the score for an n-gram, and also get the context offset of the n-gram's suffix.int[]
getNgramForOffset
(long contextOffset, int contextOrder, int word) Gets the n-gram referred to by a context-encoding.getOffsetForNgram
(int[] ngram, int startPos, int endPos) Gets the offset which refers to an n-gram.Each LM must have a WordIndexer which assigns integer IDs to each word W in the language.static <T> ContextEncodedCachingLmWrapper
<T> This type of caching is only threadsafe if you have one cache wrapper per thread.static <T> ContextEncodedCachingLmWrapper
<T> wrapWithCacheNotThreadSafe
(ContextEncodedNgramLanguageModel<T> lm, int cacheBits) static <T> ContextEncodedCachingLmWrapper
<T> This type of caching is threadsafe and (internally) maintains a separate cache for each thread that calls it.static <T> ContextEncodedCachingLmWrapper
<T> wrapWithCacheThreadSafe
(ContextEncodedNgramLanguageModel<T> lm, int cacheBits) Methods inherited from class edu.berkeley.nlp.lm.AbstractContextEncodedNgramLanguageModel
getLogProb, scoreSentence
Methods inherited from class edu.berkeley.nlp.lm.AbstractNgramLanguageModel
getLmOrder, setOovWordLogProb
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface edu.berkeley.nlp.lm.NgramLanguageModel
getLmOrder, setOovWordLogProb
-
Method Details
-
wrapWithCacheNotThreadSafe
public static <T> ContextEncodedCachingLmWrapper<T> wrapWithCacheNotThreadSafe(ContextEncodedNgramLanguageModel<T> lm) This type of caching is only threadsafe if you have one cache wrapper per thread.- Type Parameters:
T
-- Parameters:
lm
-- Returns:
-
wrapWithCacheNotThreadSafe
public static <T> ContextEncodedCachingLmWrapper<T> wrapWithCacheNotThreadSafe(ContextEncodedNgramLanguageModel<T> lm, int cacheBits) -
wrapWithCacheThreadSafe
public static <T> ContextEncodedCachingLmWrapper<T> wrapWithCacheThreadSafe(ContextEncodedNgramLanguageModel<T> lm) This type of caching is threadsafe and (internally) maintains a separate cache for each thread that calls it. Note each thread has its own cache, so if you have lots of threads, memory usage could be substantial.- Type Parameters:
T
-- Parameters:
lm
-- Returns:
-
wrapWithCacheThreadSafe
public static <T> ContextEncodedCachingLmWrapper<T> wrapWithCacheThreadSafe(ContextEncodedNgramLanguageModel<T> lm, int cacheBits) -
getWordIndexer
Description copied from interface:NgramLanguageModel
Each LM must have a WordIndexer which assigns integer IDs to each word W in the language.- Specified by:
getWordIndexer
in interfaceNgramLanguageModel<T>
- Overrides:
getWordIndexer
in classAbstractNgramLanguageModel<T>
- Returns:
-
getOffsetForNgram
public ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram, int startPos, int endPos) Description copied from interface:ContextEncodedNgramLanguageModel
Gets the offset which refers to an n-gram. If the n-gram is not in the model, then it returns the shortest suffix of the n-gram which is. This operation is not necessarily fast.- Specified by:
getOffsetForNgram
in interfaceContextEncodedNgramLanguageModel<T>
- Specified by:
getOffsetForNgram
in classAbstractContextEncodedNgramLanguageModel<T>
-
getNgramForOffset
public int[] getNgramForOffset(long contextOffset, int contextOrder, int word) Description copied from interface:ContextEncodedNgramLanguageModel
Gets the n-gram referred to by a context-encoding. This operation is not necessarily fast.- Specified by:
getNgramForOffset
in interfaceContextEncodedNgramLanguageModel<T>
- Specified by:
getNgramForOffset
in classAbstractContextEncodedNgramLanguageModel<T>
-
getLogProb
public float getLogProb(long contextOffset, int contextOrder, int word, ContextEncodedNgramLanguageModel.LmContextInfo contextOutput) Description copied from interface:ContextEncodedNgramLanguageModel
Get the score for an n-gram, and also get the context offset of the n-gram's suffix.- Specified by:
getLogProb
in interfaceContextEncodedNgramLanguageModel<T>
- Specified by:
getLogProb
in classAbstractContextEncodedNgramLanguageModel<T>
- Parameters:
contextOffset
- Offset of context (prefix) of an n-gramcontextOrder
- The (0-based) length ofcontext
(i.e.order == 0
iffcontext
refers to a unigram).word
- Last word of the n-gramcontextOutput
- Offset of the suffix of the input n-gram. If the parameter isnull
it will be ignored. This can be passed to future queries for efficient access.- Returns:
-