Package morfologik.stemming
Class TrimPrefixAndSuffixEncoder
- java.lang.Object
-
- morfologik.stemming.TrimPrefixAndSuffixEncoder
-
- All Implemented Interfaces:
ISequenceEncoder
public class TrimPrefixAndSuffixEncoder extends java.lang.Object implements ISequenceEncoder
Encodesdst
relative tosrc
by trimming whatever non-equal suffix and prefixsrc
anddst
have. The output code is (bytes):{P}{K}{suffix}
where (P
- 'A') bytes should be trimmed from the start ofsrc
, (K
- 'A') bytes should be trimmed from the end ofsrc
and then thesuffix
should be appended to the resulting byte sequence.Examples:
src: abc dst: abcd encoded: AAd src: abc dst: xyz encoded: ADxyz
-
-
Field Summary
Fields Modifier and Type Field Description private static int
REMOVE_EVERYTHING
Maximum encodable single-byte code.
-
Constructor Summary
Constructors Constructor Description TrimPrefixAndSuffixEncoder()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.nio.ByteBuffer
decode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer encoded)
Decodesencoded
relative tosource
, optionally reusing the providedByteBuffer
.java.nio.ByteBuffer
encode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer target)
Encodestarget
relative tosource
, optionally reusing the providedByteBuffer
.int
prefixBytes()
The number of encoded form's prefix bytes that should be ignored (needed for separator lookup).java.lang.String
toString()
-
-
-
Field Detail
-
REMOVE_EVERYTHING
private static final int REMOVE_EVERYTHING
Maximum encodable single-byte code.- See Also:
- Constant Field Values
-
-
Method Detail
-
encode
public java.nio.ByteBuffer encode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer target)
Description copied from interface:ISequenceEncoder
Encodestarget
relative tosource
, optionally reusing the providedByteBuffer
.- Specified by:
encode
in interfaceISequenceEncoder
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.target
- The target byte sequence to encode relative tosource
- Returns:
- Returns the
ByteBuffer
with encodedtarget
.
-
prefixBytes
public int prefixBytes()
Description copied from interface:ISequenceEncoder
The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.- Specified by:
prefixBytes
in interfaceISequenceEncoder
- See Also:
- "https://github.com/morfologik/morfologik-stemming/issues/85"
-
decode
public java.nio.ByteBuffer decode(java.nio.ByteBuffer reuse, java.nio.ByteBuffer source, java.nio.ByteBuffer encoded)
Description copied from interface:ISequenceEncoder
Decodesencoded
relative tosource
, optionally reusing the providedByteBuffer
.- Specified by:
decode
in interfaceISequenceEncoder
- Parameters:
reuse
- Reuses the providedByteBuffer
or allocates a new one if there is not enough remaining space.source
- The source byte sequence.encoded
- The previously encoded byte sequence.- Returns:
- Returns the
ByteBuffer
with decodedtarget
.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-