Class UTF8Util

java.lang.Object
org.apache.derby.iapi.util.UTF8Util

public final class UTF8Util extends Object
Utility methods for handling UTF-8 encoded byte streams.

Note that when the skip methods mention detection of invalid UTF-8 encodings, it only checks the first byte of a character. For multibyte encodings, the second and third byte are not checked for correctness, just skipped and ignored.

See Also:
  • Constructor Details

    • UTF8Util

      private UTF8Util()
      This class cannot be instantiated.
  • Method Details

    • skipUntilEOF

      public static final long skipUntilEOF(InputStream in) throws IOException
      Skip until the end-of-stream is reached.
      Parameters:
      in - byte stream with UTF-8 encoded characters
      Returns:
      The number of characters skipped.
      Throws:
      IOException - if reading from the stream fails
      UTFDataFormatException - if an invalid UTF-8 encoding is detected
    • skipFully

      public static final long skipFully(InputStream in, long charsToSkip) throws EOFException, IOException
      Skip the requested number of characters from the stream.

      Parameters:
      in - byte stream with UTF-8 encoded characters
      charsToSkip - number of characters to skip
      Returns:
      The number of bytes skipped.
      Throws:
      EOFException - if end-of-stream is reached before the requested number of characters are skipped
      IOException - if reading from the stream fails
      UTFDataFormatException - if an invalid UTF-8 encoding is detected
    • internalSkip

      private static final UTF8Util.SkipCount internalSkip(InputStream in, long charsToSkip) throws IOException
      Skip characters in the stream.

      Note that a smaller number than requested might be skipped if the end-of-stream is reached before the specified number of characters has been decoded. It is up to the caller to decide if this is an error or not. For instance, when determining the character length of a stream, Long.MAX_VALUE could be passed as the requested number of characters to skip.

      Parameters:
      in - byte stream with UTF-8 encoded characters
      charsToSkip - the number of characters to skip
      Returns:
      A long array with counts; the characters skipped at position CHAR_COUNT, the bytes skipped at position BYTE_COUNT. Note that the number of characters skipped may be smaller than the requested number.
      Throws:
      IOException - if reading from the stream fails
      UTFDataFormatException - if an invalid UTF-8 encoding is detected