Class NgramFrequencyData


  • public final class NgramFrequencyData
    extends java.lang.Object
    Contains frequency information for n-grams coming from multiple LanguageProfiles.

    For each n-gram string it knows the locales (languages) in which it occurs, and how frequent it occurs in those languages in relation to other n-grams of the same length in those same languages.

    Immutable by definition (can't make Arrays unmodifiable).

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private @NotNull java.util.List<LdLocale> langlist
      All the loaded languages, in exactly the same order as the data is in the double[] in wordLangProbMap.
      private @NotNull java.util.Map<java.lang.String,​double[]> wordLangProbMap
      Key = ngram Value = array with probabilities per loaded language, in the same order as langlist.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private NgramFrequencyData​(@NotNull java.util.Map<java.lang.String,​double[]> wordLangProbMap, @NotNull java.util.List<LdLocale> langlist)  
    • Field Detail

      • wordLangProbMap

        @NotNull
        private final @NotNull java.util.Map<java.lang.String,​double[]> wordLangProbMap
        Key = ngram Value = array with probabilities per loaded language, in the same order as langlist.
      • langlist

        @NotNull
        private final @NotNull java.util.List<LdLocale> langlist
        All the loaded languages, in exactly the same order as the data is in the double[] in wordLangProbMap. Example: if wordLangProbMap has an entry for the n-gram "foo" then for each locale in this langlist here it has a value there. Languages that don't know the n-gram have the value 0d.
    • Constructor Detail

      • NgramFrequencyData

        private NgramFrequencyData​(@NotNull
                                   @NotNull java.util.Map<java.lang.String,​double[]> wordLangProbMap,
                                   @NotNull
                                   @NotNull java.util.List<LdLocale> langlist)
    • Method Detail

      • create

        @NotNull
        public static @NotNull NgramFrequencyData create​(@NotNull
                                                         @NotNull java.util.Collection<LanguageProfile> languageProfiles,
                                                         @NotNull
                                                         @NotNull java.util.Collection<java.lang.Integer> gramLengths)
                                                  throws java.lang.IllegalArgumentException
        Parameters:
        gramLengths - for example [1,2,3]
        Throws:
        java.lang.IllegalArgumentException - if languageProfiles or gramLengths is empty, or if one of the languageProfiles does not have the grams of the required sizes.
      • getLanguageList

        @NotNull
        public @NotNull java.util.List<LdLocale> getLanguageList()
      • getLanguage

        @NotNull
        public @NotNull LdLocale getLanguage​(int pos)
      • getProbabilities

        @Nullable
        public @org.jetbrains.annotations.Nullable double[] getProbabilities​(java.lang.String ngram)
        Don't modify this data structure! (Can't make array immutable...)
        Returns:
        null if no language profile knows that ngram. entries are 0 for languages that don't know that ngram at all. The array is in the order of the getLanguageList() language list, and has exactly that size. impl note: this way the caller can handle it more efficient than returning an empty array.