Package com.aowagie.text.pdf.hyphenation
Class HyphenationTree
- java.lang.Object
-
- com.aowagie.text.pdf.hyphenation.TernaryTree
-
- com.aowagie.text.pdf.hyphenation.HyphenationTree
-
- All Implemented Interfaces:
PatternConsumer
,java.io.Serializable
,java.lang.Cloneable
class HyphenationTree extends TernaryTree implements PatternConsumer
This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.
-
-
Field Summary
Fields Modifier and Type Field Description private TernaryTree
classmap
This map stores the character classesprivate TernaryTree
ivalues
Temporary map to store interletter values on pattern loading.private static long
serialVersionUID
private java.util.HashMap
stoplist
This map stores hyphenation exceptionsprivate ByteVector
vspace
value space: stores the interletter values
-
Constructor Summary
Constructors Constructor Description HyphenationTree()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addClass(java.lang.String chargroup)
Add a character class to the tree.void
addException(java.lang.String word, java.util.ArrayList hyphenatedword)
Add an exception to the tree.void
addPattern(java.lang.String pattern, java.lang.String ivalue)
Add a pattern to the tree.private byte[]
getValues(int k)
private int
hstrcmp(char[] s, int si, char[] t, int ti)
String compare, returns 0 if equal or t is a substring of sprivate Hyphenation
hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Hyphenate word and return an array of hyphenation points.(package private) Hyphenation
hyphenate(java.lang.String word, int remainCharCount, int pushCharCount)
Hyphenate word and return a Hyphenation object.(package private) void
loadSimplePatterns(java.io.InputStream stream)
private int
packValues(java.lang.String values)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9.void
printStats()
private void
searchPatterns(char[] word, int index, byte[] il)
Search for all possible partial matches of word starting at index an update interletter values.-
Methods inherited from class com.aowagie.text.pdf.hyphenation.TernaryTree
clone, find, find, insert, insert, trimToSize
-
-
-
-
Field Detail
-
serialVersionUID
private static final long serialVersionUID
- See Also:
- Constant Field Values
-
vspace
private final ByteVector vspace
value space: stores the interletter values
-
stoplist
private final java.util.HashMap stoplist
This map stores hyphenation exceptions
-
classmap
private final TernaryTree classmap
This map stores the character classes
-
ivalues
private transient TernaryTree ivalues
Temporary map to store interletter values on pattern loading.
-
-
Method Detail
-
packValues
private int packValues(java.lang.String values)
Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.- Parameters:
values
- a string of digits from '0' to '9' representing the interletter values.- Returns:
- the index into the vspace array where the packed values are stored.
-
loadSimplePatterns
void loadSimplePatterns(java.io.InputStream stream)
-
hstrcmp
private int hstrcmp(char[] s, int si, char[] t, int ti)
String compare, returns 0 if equal or t is a substring of s
-
getValues
private byte[] getValues(int k)
-
searchPatterns
private void searchPatterns(char[] word, int index, byte[] il)
Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:
for(i=0; i
But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table
- Parameters:
word
- null terminated word to matchindex
- start index from wordil
- interletter values array to update
-
hyphenate
Hyphenation hyphenate(java.lang.String word, int remainCharCount, int pushCharCount)
Hyphenate word and return a Hyphenation object.- Parameters:
word
- the word to be hyphenatedremainCharCount
- Minimum number of characters allowed before the hyphenation point.pushCharCount
- Minimum number of characters allowed after the hyphenation point.- Returns:
- a
Hyphenation
object representing the hyphenated word or null if word is not hyphenated.
-
hyphenate
private Hyphenation hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)
Hyphenate word and return an array of hyphenation points.- Parameters:
w
- char array that contains the wordoffset
- Offset to first character in wordlen
- Length of wordremainCharCount
- Minimum number of characters allowed before the hyphenation point.pushCharCount
- Minimum number of characters allowed after the hyphenation point.- Returns:
- a
Hyphenation
object representing the hyphenated word or null if word is not hyphenated.
-
addClass
public void addClass(java.lang.String chargroup)
Add a character class to the tree. It is used bySimplePatternParser
as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.- Specified by:
addClass
in interfacePatternConsumer
- Parameters:
chargroup
- character group
-
addException
public void addException(java.lang.String word, java.util.ArrayList hyphenatedword)
Add an exception to the tree. It is used bySimplePatternParser
class as callback to store the hyphenation exceptions.- Specified by:
addException
in interfacePatternConsumer
- Parameters:
word
- normalized wordhyphenatedword
- a vector of alternating strings andhyphen
objects.
-
addPattern
public void addPattern(java.lang.String pattern, java.lang.String ivalue)
Add a pattern to the tree. Mainly, to be used bySimplePatternParser
class as callback to add a pattern to the tree.- Specified by:
addPattern
in interfacePatternConsumer
- Parameters:
pattern
- the hyphenation patternivalue
- interletter weight values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters. (i.e. '0' to '9').
-
printStats
public void printStats()
- Overrides:
printStats
in classTernaryTree
-
-