Class SynthDictionaryBuilder


  • final class SynthDictionaryBuilder
    extends DictionaryBuilder
    Create a Morfologik binary synthesizer dictionary from plain text data.
    • Field Detail

      • POLISH_IGNORE_REGEX

        private static final java.lang.String POLISH_IGNORE_REGEX
        It makes sense to remove all forms from the synthesizer dict where POS tags indicate "unknown form", "foreign word" etc., as they only take space. Probably nobody will ever use them:
        See Also:
        Constant Field Values
    • Constructor Detail

      • SynthDictionaryBuilder

        SynthDictionaryBuilder​(java.io.File infoFile)
                        throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception
      • build

        java.io.File build​(java.io.File plainTextDictFile,
                           java.io.File infoFile)
                    throws java.lang.Exception
        Throws:
        java.lang.Exception
      • getIgnoreItems

        private java.util.Set<java.lang.String> getIgnoreItems​(java.io.File file)
                                                        throws java.io.FileNotFoundException
        Throws:
        java.io.FileNotFoundException
      • getPosTagIgnoreRegex

        @Nullable
        private @Nullable java.util.regex.Pattern getPosTagIgnoreRegex​(java.io.File infoFile)
      • reverseLineContent

        private java.io.File reverseLineContent​(java.io.File plainTextDictFile,
                                                java.util.Set<java.lang.String> itemsToBeIgnored,
                                                java.util.regex.Pattern ignorePosRegex)
                                         throws java.io.IOException
        Throws:
        java.io.IOException
      • getTagFile

        private java.io.File getTagFile​(java.io.File tempFile)
      • writePosTagsToFile

        private void writePosTagsToFile​(java.io.File plainTextDictFile,
                                        java.io.File tagFile)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • collectTags

        private java.util.Set<java.lang.String> collectTags​(java.io.File plainTextDictFile)
                                                     throws java.io.IOException
        Throws:
        java.io.IOException