Package morfologik.stemming
Class DictionaryMetadata
- java.lang.Object
-
- morfologik.stemming.DictionaryMetadata
-
public final class DictionaryMetadata extends java.lang.Object
Description of attributes, their types and default values.
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.EnumMap<DictionaryAttribute,java.lang.String>
attributes
All attributes.private java.util.EnumMap<DictionaryAttribute,java.lang.Boolean>
boolAttributes
All "enabled" boolean attributes.private java.nio.charset.Charset
charset
private static java.util.Map<DictionaryAttribute,java.lang.String>
DEFAULT_ATTRIBUTES
Default attribute values.private EncoderType
encoderType
Sequence encoder.private java.lang.String
encoding
Encoding used for converting bytes to characters and vice versa.private java.util.LinkedHashMap<java.lang.Character,java.util.List<java.lang.Character>>
equivalentChars
Equivalent characters (treated similarly as equivalent chars with and without diacritics).private java.util.LinkedHashMap<java.lang.String,java.lang.String>
inputConversion
Conversion pairs for input conversion, for example to replace ligatures.private java.util.Locale
locale
static java.lang.String
METADATA_FILE_EXTENSION
Expected metadata file extension.private java.util.LinkedHashMap<java.lang.String,java.lang.String>
outputConversion
Conversion pairs for output conversion, for example to replace ligatures.private java.util.LinkedHashMap<java.lang.String,java.util.List<java.lang.String>>
replacementPairs
Replacement pairs for non-obvious candidate search in a speller dictionary.private static java.util.EnumSet<DictionaryAttribute>
REQUIRED_ATTRIBUTES
Required attributes.private byte
separator
A separator character between fields (stem, lemma, form).private char
separatorChar
-
Constructor Summary
Constructors Constructor Description DictionaryMetadata(java.util.Map<DictionaryAttribute,java.lang.String> attrs)
Create an instance from an attribute map.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static DictionaryMetadataBuilder
builder()
java.util.Map<DictionaryAttribute,java.lang.String>
getAttributes()
java.nio.charset.CharsetDecoder
getDecoder()
java.nio.charset.CharsetEncoder
getEncoder()
java.lang.String
getEncoding()
java.util.LinkedHashMap<java.lang.Character,java.util.List<java.lang.Character>>
getEquivalentChars()
static java.lang.String
getExpectedMetadataFileName(java.lang.String dictionaryFile)
Returns the expected name of the metadata file, based on the name of the dictionary file.static java.nio.file.Path
getExpectedMetadataLocation(java.nio.file.Path dictionary)
java.util.LinkedHashMap<java.lang.String,java.lang.String>
getInputConversionPairs()
java.util.Locale
getLocale()
java.util.LinkedHashMap<java.lang.String,java.lang.String>
getOutputConversionPairs()
java.util.LinkedHashMap<java.lang.String,java.util.List<java.lang.String>>
getReplacementPairs()
byte
getSeparator()
char
getSeparatorAsChar()
EncoderType
getSequenceEncoderType()
boolean
isConvertingCase()
boolean
isFrequencyIncluded()
boolean
isIgnoringAllUppercase()
boolean
isIgnoringCamelCase()
boolean
isIgnoringDiacritics()
boolean
isIgnoringNumbers()
boolean
isIgnoringPunctuation()
boolean
isSupportingRunOnWords()
static DictionaryMetadata
read(java.io.InputStream metadataStream)
Read dictionary metadata from a property file (stream).void
write(java.io.Writer writer)
Write dictionary attributes (metadata).
-
-
-
Field Detail
-
DEFAULT_ATTRIBUTES
private static java.util.Map<DictionaryAttribute,java.lang.String> DEFAULT_ATTRIBUTES
Default attribute values.
-
REQUIRED_ATTRIBUTES
private static java.util.EnumSet<DictionaryAttribute> REQUIRED_ATTRIBUTES
Required attributes.
-
separator
private byte separator
A separator character between fields (stem, lemma, form). The character must be within byte range (FSA uses bytes internally).
-
separatorChar
private char separatorChar
-
encoding
private java.lang.String encoding
Encoding used for converting bytes to characters and vice versa.
-
charset
private java.nio.charset.Charset charset
-
locale
private java.util.Locale locale
-
replacementPairs
private java.util.LinkedHashMap<java.lang.String,java.util.List<java.lang.String>> replacementPairs
Replacement pairs for non-obvious candidate search in a speller dictionary.
-
inputConversion
private java.util.LinkedHashMap<java.lang.String,java.lang.String> inputConversion
Conversion pairs for input conversion, for example to replace ligatures.
-
outputConversion
private java.util.LinkedHashMap<java.lang.String,java.lang.String> outputConversion
Conversion pairs for output conversion, for example to replace ligatures.
-
equivalentChars
private java.util.LinkedHashMap<java.lang.Character,java.util.List<java.lang.Character>> equivalentChars
Equivalent characters (treated similarly as equivalent chars with and without diacritics). For example, Polish ł can be specified as equivalent to l. This implements a feature similar to hunspell MAP in the affix file.
-
attributes
private final java.util.EnumMap<DictionaryAttribute,java.lang.String> attributes
All attributes.
-
boolAttributes
private final java.util.EnumMap<DictionaryAttribute,java.lang.Boolean> boolAttributes
All "enabled" boolean attributes.
-
encoderType
private EncoderType encoderType
Sequence encoder.
-
METADATA_FILE_EXTENSION
public static final java.lang.String METADATA_FILE_EXTENSION
Expected metadata file extension.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
DictionaryMetadata
public DictionaryMetadata(java.util.Map<DictionaryAttribute,java.lang.String> attrs)
Create an instance from an attribute map.- Parameters:
attrs
- A set ofDictionaryAttribute
keys and their associated values.- See Also:
DictionaryMetadataBuilder
-
-
Method Detail
-
getAttributes
public java.util.Map<DictionaryAttribute,java.lang.String> getAttributes()
- Returns:
- Return all metadata attributes.
-
getEncoding
public java.lang.String getEncoding()
-
getSeparator
public byte getSeparator()
-
getLocale
public java.util.Locale getLocale()
-
getInputConversionPairs
public java.util.LinkedHashMap<java.lang.String,java.lang.String> getInputConversionPairs()
-
getOutputConversionPairs
public java.util.LinkedHashMap<java.lang.String,java.lang.String> getOutputConversionPairs()
-
getReplacementPairs
public java.util.LinkedHashMap<java.lang.String,java.util.List<java.lang.String>> getReplacementPairs()
-
getEquivalentChars
public java.util.LinkedHashMap<java.lang.Character,java.util.List<java.lang.Character>> getEquivalentChars()
-
isFrequencyIncluded
public boolean isFrequencyIncluded()
-
isIgnoringPunctuation
public boolean isIgnoringPunctuation()
-
isIgnoringNumbers
public boolean isIgnoringNumbers()
-
isIgnoringCamelCase
public boolean isIgnoringCamelCase()
-
isIgnoringAllUppercase
public boolean isIgnoringAllUppercase()
-
isIgnoringDiacritics
public boolean isIgnoringDiacritics()
-
isConvertingCase
public boolean isConvertingCase()
-
isSupportingRunOnWords
public boolean isSupportingRunOnWords()
-
getDecoder
public java.nio.charset.CharsetDecoder getDecoder()
- Returns:
- Returns a new
CharsetDecoder
for theencoding
.
-
getEncoder
public java.nio.charset.CharsetEncoder getEncoder()
- Returns:
- Returns a new
CharsetEncoder
for theencoding
.
-
getSequenceEncoderType
public EncoderType getSequenceEncoderType()
- Returns:
- Return sequence encoder type.
-
getSeparatorAsChar
public char getSeparatorAsChar()
-
builder
public static DictionaryMetadataBuilder builder()
- Returns:
- A shortcut returning
DictionaryMetadataBuilder
.
-
getExpectedMetadataFileName
public static java.lang.String getExpectedMetadataFileName(java.lang.String dictionaryFile)
Returns the expected name of the metadata file, based on the name of the dictionary file. The expected name is resolved by truncating any file extension ofname
and appendingMETADATA_FILE_EXTENSION
.- Parameters:
dictionaryFile
- The name of the dictionary (*.dict
) file.- Returns:
- Returns the expected name of the metadata file.
-
getExpectedMetadataLocation
public static java.nio.file.Path getExpectedMetadataLocation(java.nio.file.Path dictionary)
- Parameters:
dictionary
- The location of the dictionary file.- Returns:
- Returns the expected location of a metadata file.
-
read
public static DictionaryMetadata read(java.io.InputStream metadataStream) throws java.io.IOException
Read dictionary metadata from a property file (stream).- Parameters:
metadataStream
- The stream with metadata.- Returns:
- Returns
DictionaryMetadata
read from a the stream (property file). - Throws:
java.io.IOException
- Thrown if an I/O exception occurs.
-
write
public void write(java.io.Writer writer) throws java.io.IOException
Write dictionary attributes (metadata).- Parameters:
writer
- The writer to write to.- Throws:
java.io.IOException
- Thrown when an I/O error occurs.
-
-