Package org.fife.io

Class UnicodeReader

java.lang.Object
java.io.Reader
org.fife.io.UnicodeReader
All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public class UnicodeReader extends Reader
A reader capable of identifying Unicode streams by their BOMs. This class will recognize the following encodings:
  • UTF-8
  • UTF-16LE
  • UTF-16BE
  • UTF-32LE
  • UTF-32BE
If the stream is not found to be any of the above, then a default encoding is used for reading. The user can specify this default encoding, or a system default will be used.

For optimum performance, it is recommended that you wrap all instances of UnicodeReader with a java.io.BufferedReader.

This class is mostly ripped off from the workaround in the description of Java Bug 4508058.

Version:
0.9
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int
    The size of a BOM.
    private String
    The encoding being used.
    The input stream from which we're really reading.

    Fields inherited from class java.io.Reader

    lock
  • Constructor Summary

    Constructors
    Constructor
    Description
    This utility constructor is here because you will usually use a UnicodeReader on files.
    UnicodeReader(File file, String defaultEncoding)
    This utility constructor is here because you will usually use a UnicodeReader on files.
    UnicodeReader(File file, Charset defaultCharset)
    This utility constructor is here because you will usually use a UnicodeReader on files.
    Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.
    UnicodeReader(InputStream in, String defaultEncoding)
    Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then defaultEncoding is used.
    UnicodeReader(InputStream in, Charset defaultCharset)
    Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then defaultEncoding is used.
    This utility constructor is here because you will usually use a UnicodeReader on files.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Closes this reader.
    Returns the encoding being used to read this input stream (i.e., the encoding of the file).
    protected void
    init(InputStream in, String defaultEncoding)
    Read-ahead four bytes and check for BOM marks.
    int
    read(char[] cbuf, int off, int len)
    Read characters into a portion of an array.

    Methods inherited from class java.io.Reader

    mark, markSupported, nullReader, read, read, read, ready, reset, skip, transferTo

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • internalIn

      private InputStreamReader internalIn
      The input stream from which we're really reading.
    • encoding

      private String encoding
      The encoding being used. We keep our own instead of using the string returned by java.io.InputStreamReader since that class does not return user-friendly names.
    • BOM_SIZE

      private static final int BOM_SIZE
      The size of a BOM.
      See Also:
  • Constructor Details

    • UnicodeReader

      public UnicodeReader(String file) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.

      Parameters:
      file - The file from which you want to read.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(File file) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.

      Parameters:
      file - The file from which you want to read.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(File file, String defaultEncoding) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a specified default encoding is used.

      Parameters:
      file - The file from which you want to read.
      defaultEncoding - The encoding to use if no BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(File file, Charset defaultCharset) throws IOException
      This utility constructor is here because you will usually use a UnicodeReader on files.

      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a specified default encoding is used.

      Parameters:
      file - The file from which you want to read.
      defaultCharset - The encoding to use if no BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
      SecurityException - If a security manager exists and its checkRead method denies read access to the file.
    • UnicodeReader

      public UnicodeReader(InputStream in) throws IOException
      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then a system default encoding is used.
      Parameters:
      in - The input stream from which to read.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
    • UnicodeReader

      public UnicodeReader(InputStream in, String defaultEncoding) throws IOException
      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then defaultEncoding is used.
      Parameters:
      in - The input stream from which to read.
      defaultEncoding - The encoding to use if no recognized BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
    • UnicodeReader

      public UnicodeReader(InputStream in, Charset defaultCharset) throws IOException
      Creates a reader using the encoding specified by the BOM in the file; if there is no recognized BOM, then defaultEncoding is used.
      Parameters:
      in - The input stream from which to read.
      defaultCharset - The encoding to use if no recognized BOM is found. If this value is null, a system default is used.
      Throws:
      IOException - If an error occurs when checking for/reading the BOM.
  • Method Details

    • close

      public void close() throws IOException
      Closes this reader.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Reader
      Throws:
      IOException
    • getEncoding

      public String getEncoding()
      Returns the encoding being used to read this input stream (i.e., the encoding of the file). If a BOM was recognized, then the specific Unicode type is returned; otherwise, either the default encoding passed into the constructor or the system default is returned.
      Returns:
      The encoding of the stream.
    • init

      protected void init(InputStream in, String defaultEncoding) throws IOException
      Read-ahead four bytes and check for BOM marks. Extra bytes are unread back to the stream, only BOM bytes are skipped.
      Parameters:
      defaultEncoding - The encoding to use if no BOM was recognized. If this value is null, then a system default is used.
      Throws:
      IOException - If an error occurs when trying to read a BOM.
    • read

      public int read(char[] cbuf, int off, int len) throws IOException
      Read characters into a portion of an array. This method will block until some input is available, an I/O error occurs, or the end of the stream is reached.
      Specified by:
      read in class Reader
      Parameters:
      cbuf - The buffer into which to read.
      off - The offset at which to start storing characters.
      len - The maximum number of characters to read.
      Returns:
      The number of characters read, or -1 if the end of the stream has been reached.
      Throws:
      IOException