Interface UForwardCharacterIterator
- All Known Implementing Classes:
UCharacterIterator
Characters can be accessed in two ways: as code units or as
code points.
Unicode code points are 21-bit integers and are the scalar values
of Unicode characters. ICU uses the type int
for them.
Unicode code units are the storage units of a given
Unicode/UCS Transformation Format (a character encoding scheme).
With UTF-16, all code points can be represented with either one
or two code units ("surrogates").
String storage is typically based on code units, while properties
of characters are typically determined using code point values.
Some processes may be designed to work with sequences of code units,
or it may be known that all characters that are important to an
algorithm can be represented with single code units.
Other processes will need to use the code point access functions.
ForwardCharacterIterator provides next() to access
a code unit and advance an internal position into the text object,
similar to a return text[position++]
.
It provides nextCodePoint() to access a code point and advance an internal
position.
nextCodePoint() assumes that the current position is that of
the beginning of a code point, i.e., of its first code unit.
After nextCodePoint(), this will be true again.
In general, access to code units and code points in the same
iteration loop should not be mixed. In UTF-16, if the current position
is on a second code unit (Low Surrogate), then only that code unit
is returned even by nextCodePoint().
Usage:
public void function1(UForwardCharacterIterator it) {
int c;
while((c=it.next())!=UForwardCharacterIterator.DONE) {
// use c
}
}
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Indicator that we have reached the ends of the UTF16 text. -
Method Summary
Modifier and TypeMethodDescriptionint
next()
Returns the UTF16 code unit at index, and increments to the next code unit (post-increment semantics).int
Returns the code point at index, and increments to the next code point (post-increment semantics).
-
Field Details
-
DONE
static final int DONEIndicator that we have reached the ends of the UTF16 text.- See Also:
-
-
Method Details
-
next
int next()Returns the UTF16 code unit at index, and increments to the next code unit (post-increment semantics). If index is out of range, DONE is returned, and the iterator is reset to the limit of the text.- Returns:
- the next UTF16 code unit, or DONE if the index is at the limit of the text.
-
nextCodePoint
int nextCodePoint()Returns the code point at index, and increments to the next code point (post-increment semantics). If index does not point to a valid surrogate pair, the behavior is the same asnext()
. Otherwise the iterator is incremented past the surrogate pair, and the code point represented by the pair is returned.- Returns:
- the next codepoint in text, or DONE if the index is at the limit of the text.
-