Class BaseSynthesizer

  • All Implemented Interfaces:
    Synthesizer

    public class BaseSynthesizer
    extends java.lang.Object
    implements Synthesizer
    • Constructor Summary

      Constructors 
      Constructor Description
      BaseSynthesizer​(java.lang.String sorosFileName, java.lang.String resourceFileName, java.lang.String tagFileName, Language lang)  
      BaseSynthesizer​(java.lang.String resourceFileName, java.lang.String tagFileName, Language lang)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private Soros createNumberSpeller​(java.lang.String langcode)  
      protected morfologik.stemming.IStemmer createStemmer()
      Creates a new IStemmer based on the configured dictionary.
      protected morfologik.stemming.Dictionary getDictionary()
      Returns the Dictionary used for this synthesizer.
      java.lang.String getPosTagCorrection​(java.lang.String posTag)
      Gets a corrected version of the POS tag used for synthesis.
      java.lang.String getSpelledNumber​(java.lang.String arabicNumeral)
      Spells out a number
      morfologik.stemming.IStemmer getStemmer()  
      protected void initPossibleTags()  
      protected void lookup​(java.lang.String lemma, java.lang.String posTag, java.util.List<java.lang.String> results)
      Lookup the inflected forms of a lemma defined by a part-of-speech tag.
      java.lang.String[] synthesize​(AnalyzedToken token, java.lang.String posTag)
      Get a form of a given AnalyzedToken, where the form is defined by a part-of-speech tag.
      java.lang.String[] synthesize​(AnalyzedToken token, java.lang.String posTag, boolean posTagRegExp)
      Generates a form of the word with a given POS tag for a given lemma.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • possibleTags

        protected volatile java.util.List<java.lang.String> possibleTags
      • tagFileName

        private final java.lang.String tagFileName
      • resourceFileName

        private final java.lang.String resourceFileName
      • stemmer

        private final morfologik.stemming.IStemmer stemmer
      • sorosFileName

        private final java.lang.String sorosFileName
      • numberSpeller

        private final Soros numberSpeller
      • dictionary

        private volatile morfologik.stemming.Dictionary dictionary
    • Constructor Detail

      • BaseSynthesizer

        public BaseSynthesizer​(java.lang.String sorosFileName,
                               java.lang.String resourceFileName,
                               java.lang.String tagFileName,
                               Language lang)
        Parameters:
        resourceFileName - The dictionary file name.
        tagFileName - The name of a file containing all possible tags.
      • BaseSynthesizer

        public BaseSynthesizer​(java.lang.String resourceFileName,
                               java.lang.String tagFileName,
                               Language lang)
    • Method Detail

      • getDictionary

        protected morfologik.stemming.Dictionary getDictionary()
                                                        throws java.io.IOException
        Returns the Dictionary used for this synthesizer. The dictionary file can be defined in the constructor.
        Throws:
        java.io.IOException - In case the dictionary cannot be loaded.
      • createStemmer

        protected morfologik.stemming.IStemmer createStemmer()
        Creates a new IStemmer based on the configured dictionary. The result must not be shared among threads.
        Since:
        2.3
      • createNumberSpeller

        private Soros createNumberSpeller​(java.lang.String langcode)
      • lookup

        protected void lookup​(java.lang.String lemma,
                              java.lang.String posTag,
                              java.util.List<java.lang.String> results)
        Lookup the inflected forms of a lemma defined by a part-of-speech tag.
        Parameters:
        lemma - the lemma to be inflected.
        posTag - the desired part-of-speech tag.
        results - the list to collect the inflected forms.
      • synthesize

        public java.lang.String[] synthesize​(AnalyzedToken token,
                                             java.lang.String posTag)
                                      throws java.io.IOException
        Get a form of a given AnalyzedToken, where the form is defined by a part-of-speech tag.
        Specified by:
        synthesize in interface Synthesizer
        Parameters:
        token - AnalyzedToken to be inflected.
        posTag - The desired part-of-speech tag.
        Returns:
        inflected words, or an empty array if no forms were found
        Throws:
        java.io.IOException
      • synthesize

        public java.lang.String[] synthesize​(AnalyzedToken token,
                                             java.lang.String posTag,
                                             boolean posTagRegExp)
                                      throws java.io.IOException
        Description copied from interface: Synthesizer
        Generates a form of the word with a given POS tag for a given lemma. POS tag can be specified using regular expressions.
        Specified by:
        synthesize in interface Synthesizer
        Parameters:
        token - the token to be used for synthesis
        posTag - POS tag of the form to be generated
        posTagRegExp - Specifies whether the posTag string is a regular expression.
        Throws:
        java.io.IOException
      • getPosTagCorrection

        public java.lang.String getPosTagCorrection​(java.lang.String posTag)
        Description copied from interface: Synthesizer
        Gets a corrected version of the POS tag used for synthesis. Useful when the tagset defines special disjunction that need to be converted into regexp disjunctions.
        Specified by:
        getPosTagCorrection in interface Synthesizer
        Parameters:
        posTag - original POS tag to correct
        Returns:
        converted POS tag
      • getStemmer

        public morfologik.stemming.IStemmer getStemmer()
        Returns:
        the stemmer interface to be used.
        Since:
        2.5
      • initPossibleTags

        protected void initPossibleTags()
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • getSpelledNumber

        public java.lang.String getSpelledNumber​(java.lang.String arabicNumeral)
        Description copied from interface: Synthesizer
        Spells out a number
        Specified by:
        getSpelledNumber in interface Synthesizer
        Parameters:
        arabicNumeral - in arabic numerals
        Returns:
        String of the spelled out number