Class MultiWordChunker

    • Field Detail

      • filename

        private final java.lang.String filename
      • allowFirstCapitalized

        private final boolean allowFirstCapitalized
      • mStartSpace

        private java.util.Map<java.lang.String,​java.lang.Integer> mStartSpace
      • mStartNoSpace

        private java.util.Map<java.lang.String,​java.lang.Integer> mStartNoSpace
      • mFull

        private java.util.Map<java.lang.String,​java.lang.String> mFull
    • Constructor Detail

      • MultiWordChunker

        public MultiWordChunker​(java.lang.String filename)
        Parameters:
        filename - file text with multiwords and tags
      • MultiWordChunker

        public MultiWordChunker​(java.lang.String filename,
                                boolean allowFirstCapitalized)
        Parameters:
        filename - file text with multiwords and tags
        allowFirstCapitalized - if set to true, first word of the multiword can be capitalized
    • Method Detail

      • lazyInit

        private void lazyInit()
      • disambiguate

        public final AnalyzedSentence disambiguate​(AnalyzedSentence input)
        Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.
        Parameters:
        input - The tokens to be chunked.
        Returns:
        AnalyzedSentence with additional markers.
      • loadWords

        private java.util.List<java.lang.String> loadWords​(java.io.InputStream stream)