Class UTF8Util


  • public final class UTF8Util
    extends java.lang.Object
    Utility methods for handling UTF-8 encoded byte streams.

    Note that when the skip methods mention detection of invalid UTF-8 encodings, it only checks the first byte of a character. For multibyte encodings, the second and third byte are not checked for correctness, just skipped and ignored.

    See Also:
    DataInput
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  UTF8Util.SkipCount
      Helper class to hold skip counts; one for chars and one for bytes.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private UTF8Util()
      This class cannot be instantiated.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static UTF8Util.SkipCount internalSkip​(java.io.InputStream in, long charsToSkip)
      Skip characters in the stream.
      static long skipFully​(java.io.InputStream in, long charsToSkip)
      Skip the requested number of characters from the stream.
      static long skipUntilEOF​(java.io.InputStream in)
      Skip until the end-of-stream is reached.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • UTF8Util

        private UTF8Util()
        This class cannot be instantiated.
    • Method Detail

      • skipUntilEOF

        public static final long skipUntilEOF​(java.io.InputStream in)
                                       throws java.io.IOException
        Skip until the end-of-stream is reached.
        Parameters:
        in - byte stream with UTF-8 encoded characters
        Returns:
        The number of characters skipped.
        Throws:
        java.io.IOException - if reading from the stream fails
        java.io.UTFDataFormatException - if an invalid UTF-8 encoding is detected
      • skipFully

        public static final long skipFully​(java.io.InputStream in,
                                           long charsToSkip)
                                    throws java.io.EOFException,
                                           java.io.IOException
        Skip the requested number of characters from the stream.

        Parameters:
        in - byte stream with UTF-8 encoded characters
        charsToSkip - number of characters to skip
        Returns:
        The number of bytes skipped.
        Throws:
        java.io.EOFException - if end-of-stream is reached before the requested number of characters are skipped
        java.io.IOException - if reading from the stream fails
        java.io.UTFDataFormatException - if an invalid UTF-8 encoding is detected
      • internalSkip

        private static final UTF8Util.SkipCount internalSkip​(java.io.InputStream in,
                                                             long charsToSkip)
                                                      throws java.io.IOException
        Skip characters in the stream.

        Note that a smaller number than requested might be skipped if the end-of-stream is reached before the specified number of characters has been decoded. It is up to the caller to decide if this is an error or not. For instance, when determining the character length of a stream, Long.MAX_VALUE could be passed as the requested number of characters to skip.

        Parameters:
        in - byte stream with UTF-8 encoded characters
        charsToSkip - the number of characters to skip
        Returns:
        A long array with counts; the characters skipped at position CHAR_COUNT, the bytes skipped at position BYTE_COUNT. Note that the number of characters skipped may be smaller than the requested number.
        Throws:
        java.io.IOException - if reading from the stream fails
        java.io.UTFDataFormatException - if an invalid UTF-8 encoding is detected