Class MimeTypeDetector

java.lang.Object
org.apache.sis.internal.storage.xml.MimeTypeDetector

abstract class MimeTypeDetector extends Object
Detects the MIME type of a XML document from the namespace of the root element. This class does not support encoding: it will search only for US-ASCII characters. It does not prevent usage with encodings like ISO-LATIN-1 or UTF-8, provided that the characters in the [32 … 122] range (from space to 'z') are the same and cannot be used as part of a multi-byte character.

This class tries to implement a lightweight detection mechanism. We cannot for instance unmarshal the whole document with JAXB and look at the class of unmarshalled object, since it would be way too heavy.

Since:
0.4
Version:
1.0
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private byte[]
    A buffer for reading a word from the XML document, assumed using US-ASCII characters.
    (package private) boolean
    Sets to true when read() implementations reached the ByteBuffer limit, but the buffer has enough capacity for more bytes.
    private int
    Number of valid characters in buffer string.
    private static final int
    The maximal US-ASCII value, inclusive.
    private final Map<String,String>
    The mapping from XML namespaces to MIME types.
    private final Map<String,String>
    The mapping from root elements to MIME types.
    private static final byte[]
    The "xmlns" string as a sequence of bytes.
  • Constructor Summary

    Constructors
    Constructor
    Description
    MimeTypeDetector(Map<String,String> mimeForNameSpaces, Map<String,String> mimeForRootElements)
    Creates a new instance.
  • Method Summary

    Modifier and Type
    Method
    Description
    private int
    afterSpaces(int c)
    If the given character is a space, skips it and all following spaces.
    private String
    Returns the current buffer content as a US-ASCII string.
    (package private) final String
    Returns the MIME type, or null if unknown.
    private int
    matches(byte[] word, int n, int c, char separator)
    Skips the spaces if any, then the given characters, then the spaces, then the given separator.
    (package private) final ProbeResult
    Wraps the call to getMimeType() for catching IOException and for instantiating the ProbeResult.
    (package private) abstract int
    Reads a single byte or character, or -1 if we reached the end of the stream portion that we are allowed to read.
    private int
    readAfter(int search)
    Skips all bytes or characters up to search, then returns the character after it.
    private void
    remember(int c)
    Adds the given byte in the buffer, increasing its capacity if needed.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • mimeForNameSpaces

      private final Map<String,String> mimeForNameSpaces
      The mapping from XML namespaces to MIME types. This map shall be read-only, since we do not synchronize it.
    • mimeForRootElements

      private final Map<String,String> mimeForRootElements
      The mapping from root elements to MIME types. Used only if the root element is in the default namespace and contains no xmlns attributes for that namespace.
    • XMLNS

      private static final byte[] XMLNS
      The "xmlns" string as a sequence of bytes.
    • MAX_ASCII

      private static final int MAX_ASCII
      The maximal US-ASCII value, inclusive.
      See Also:
    • buffer

      private byte[] buffer
      A buffer for reading a word from the XML document, assumed using US-ASCII characters.
    • length

      private int length
      Number of valid characters in buffer string.
    • insufficientBytes

      boolean insufficientBytes
      Sets to true when read() implementations reached the ByteBuffer limit, but the buffer has enough capacity for more bytes. In such case the probeContent() method will return ProbeResult.INSUFFICIENT_BYTES, which means that the method requests more bytes for detecting the MIME type.
      See Also:
  • Constructor Details

    • MimeTypeDetector

      MimeTypeDetector(Map<String,String> mimeForNameSpaces, Map<String,String> mimeForRootElements)
      Creates a new instance.
      Parameters:
      mimeForNameSpaces - the mapping from XML namespaces to MIME type.
      mimeForRootElements - the mapping from root elements to MIME types, used only as a fallback.
  • Method Details

    • current

      private String current() throws UnsupportedEncodingException
      Returns the current buffer content as a US-ASCII string.
      Throws:
      UnsupportedEncodingException
    • remember

      private void remember(int c)
      Adds the given byte in the buffer, increasing its capacity if needed.
    • read

      abstract int read() throws IOException
      Reads a single byte or character, or -1 if we reached the end of the stream portion that we are allowed to read. We are typically not allowed to read the full stream because only a limited amount of bytes is cached. This method may return a Unicode code point (i.e. the returned value may not fit in char).
      Returns:
      the character, or -1 on end of stream window.
      Throws:
      IOException - if an error occurred while reading the byte or character.
    • readAfter

      private int readAfter(int search) throws IOException
      Skips all bytes or characters up to search, then returns the character after it. Characters inside quotes will be ignored.
      Parameters:
      search - the byte or character to skip.
      Returns:
      the byte or character after search, or -1 on end of stream window.
      Throws:
      IOException - if an error occurred while reading the bytes or characters.
    • afterSpaces

      private int afterSpaces(int c) throws IOException
      If the given character is a space, skips it and all following spaces. Returns the first non-space character.

      For the purpose of this method, a "space" is considered to be the ' ' character and all control characters (character below 32, which include tabulations and line feeds). This is the same criterion than String.trim(), but does not include Unicode spaces.

      Returns:
      the first non-space character, or -1 on end of stream window.
      Throws:
      IOException - if an error occurred while reading the bytes or characters.
    • matches

      private int matches(byte[] word, int n, int c, char separator) throws IOException
      Skips the spaces if any, then the given characters, then the spaces, then the given separator. After this method class, the stream position is on the first character after the separator if a match has been found, or after the first unknown character otherwise.
      Parameters:
      word - the word to search, as US-ASCII characters.
      n - number of valid characters in word.
      c - value of afterSpaces(read()).
      separator - the ':' or '=' character.
      Returns:
      1 if a match is found, 0 if no match, or -1 on end of stream window.
      Throws:
      IOException - if an error occurred while reading the bytes or characters.
    • getMimeType

      final String getMimeType() throws IOException
      Returns the MIME type, or null if unknown. The call shall have already skipped the "<?xml " characters before to invoke this method.
      Throws:
      IOException - if an error occurred while reading the bytes or characters.
    • probeContent

      final ProbeResult probeContent() throws DataStoreException
      Wraps the call to getMimeType() for catching IOException and for instantiating the ProbeResult.
      Throws:
      DataStoreException