Package net.sf.saxon.serialize
Class UTF8Writer
java.lang.Object
java.io.Writer
net.sf.saxon.serialize.UTF8Writer
- All Implemented Interfaces:
Closeable
,Flushable
,Appendable
,AutoCloseable
,UnicodeWriter
Specialized buffering UTF-8 writer.
The main reason for custom version is to allow for efficient
buffer recycling; the second benefit is that encoder has less
overhead for short content encoding (compared to JDK default
codecs).
- Author:
- Tatu Saloranta. Modified by Michael Kay to enable efficient output of Unicode strings.
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) int
When outputting chars from BMP, surrogate pairs need to be coalesced.(package private) static final int
(package private) static final int
(package private) static final int
(package private) static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Complete the writing of characters to the result.void
flush()
Flush the contents of any buffers.void
write
(char[] cbuf) void
write
(char[] cbuf, int off, int len) void
write
(int c) Write a single char.void
Process a supplied stringvoid
void
write
(UnicodeString chars) Process a supplied stringvoid
writeAscii
(byte[] content) Write a sequence of ASCII characters.void
writeAscii
(byte[] chars, int off, int len) Write a sequence of ASCII characters.void
writeCodePoint
(int codepoint) Process a single character.void
writeLatin1
(byte[] bytes, int off, int len) void
writeRepeatedAscii
(byte ch, int repeat) Write an ASCII character repeatedly.Methods inherited from class java.io.Writer
append, append, append, nullWriter
-
Field Details
-
SURR1_FIRST
static final int SURR1_FIRST- See Also:
-
SURR1_LAST
static final int SURR1_LAST- See Also:
-
SURR2_FIRST
static final int SURR2_FIRST- See Also:
-
SURR2_LAST
static final int SURR2_LAST- See Also:
-
_surrogate
int _surrogateWhen outputting chars from BMP, surrogate pairs need to be coalesced. To do this, both pairs must be known first; and since it is possible pairs may be split, we need temporary storage for the first half
-
-
Constructor Details
-
UTF8Writer
-
UTF8Writer
-
-
Method Details
-
close
Description copied from interface:UnicodeWriter
Complete the writing of characters to the result. The default implementation does nothing.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceUnicodeWriter
- Specified by:
close
in classWriter
- Throws:
IOException
- if processing fails for any reason
-
flush
Description copied from interface:UnicodeWriter
Flush the contents of any buffers. The default implementation does nothing.- Specified by:
flush
in interfaceFlushable
- Specified by:
flush
in interfaceUnicodeWriter
- Specified by:
flush
in classWriter
- Throws:
IOException
- if processing fails for any reason
-
write
- Overrides:
write
in classWriter
- Throws:
IOException
-
write
- Specified by:
write
in classWriter
- Throws:
IOException
-
writeLatin1
- Throws:
IOException
-
writeAscii
Write a sequence of ASCII characters. The caller is responsible for ensuring that each byte represents a character in the range 1-127- Specified by:
writeAscii
in interfaceUnicodeWriter
- Parameters:
content
- the content to be written- Throws:
IOException
- if processing fails for any reason
-
writeAscii
Write a sequence of ASCII characters. The caller is responsible for ensuring that each byte represents a character in the range 1-127- Parameters:
chars
- the characters to be writtenoff
- the offset of the first character to be includedlen
- the number of characters to be written- Throws:
IOException
-
writeRepeatedAscii
Write an ASCII character repeatedly. Used for serializing whitespace.- Specified by:
writeRepeatedAscii
in interfaceUnicodeWriter
- Parameters:
ch
- the ASCII character to be serialized (must be less than 0x7f)repeat
- the number of occurrences to output- Throws:
IOException
- if it fails
-
writeCodePoint
Process a single character. Default implementation wraps the codepoint into a single-characterUnicodeString
- Specified by:
writeCodePoint
in interfaceUnicodeWriter
- Parameters:
codepoint
- the character to be processed. Must not be a surrogate- Throws:
IOException
- if processing fails for any reason
-
write
Write a single char.Note (MHK) Although the Writer interface says that the top half of the int is ignored, this implementation appears to accept a Unicode codepoint which is output as a 4-byte UTF-8 sequence.
- Overrides:
write
in classWriter
- Parameters:
c
- the char to be written- Throws:
IOException
- If an I/O error occurs
-
write
Process a supplied string- Specified by:
write
in interfaceUnicodeWriter
- Parameters:
chars
- the characters to be processed- Throws:
IOException
- if processing fails for any reason
-
write
Description copied from interface:UnicodeWriter
Process a supplied string- Specified by:
write
in interfaceUnicodeWriter
- Overrides:
write
in classWriter
- Parameters:
str
- the characters to be processed- Throws:
IOException
- if processing fails for any reason
-
write
- Overrides:
write
in classWriter
- Throws:
IOException
-