Class MimeTypeDetector
java.lang.Object
org.apache.sis.internal.storage.xml.MimeTypeDetector
Detects the MIME type of a XML document from the namespace of the root element.
This class does not support encoding: it will search only for US-ASCII characters.
It does not prevent usage with encodings like ISO-LATIN-1 or UTF-8, provided that
the characters in the [32 … 122] range (from space to 'z') are the same and cannot
be used as part of a multi-byte character.
This class tries to implement a lightweight detection mechanism. We cannot for instance unmarshal the whole document with JAXB and look at the class of unmarshalled object, since it would be way too heavy.
- Since:
- 0.4
- Version:
- 1.0
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate byte[]
A buffer for reading a word from the XML document, assumed using US-ASCII characters.(package private) boolean
Sets totrue
whenread()
implementations reached theByteBuffer
limit, but the buffer has enough capacity for more bytes.private int
Number of valid characters inbuffer
string.private static final int
The maximal US-ASCII value, inclusive.The mapping from XML namespaces to MIME types.The mapping from root elements to MIME types.private static final byte[]
The"xmlns"
string as a sequence of bytes. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate int
afterSpaces
(int c) If the given character is a space, skips it and all following spaces.private String
current()
Returns the currentbuffer
content as a US-ASCII string.(package private) final String
Returns the MIME type, ornull
if unknown.private int
matches
(byte[] word, int n, int c, char separator) Skips the spaces if any, then the given characters, then the spaces, then the given separator.(package private) final ProbeResult
(package private) abstract int
read()
Reads a single byte or character, or -1 if we reached the end of the stream portion that we are allowed to read.private int
readAfter
(int search) Skips all bytes or characters up tosearch
, then returns the character after it.private void
remember
(int c) Adds the given byte in thebuffer
, increasing its capacity if needed.
-
Field Details
-
mimeForNameSpaces
The mapping from XML namespaces to MIME types. This map shall be read-only, since we do not synchronize it. -
mimeForRootElements
The mapping from root elements to MIME types. Used only if the root element is in the default namespace and contains noxmlns
attributes for that namespace. -
XMLNS
private static final byte[] XMLNSThe"xmlns"
string as a sequence of bytes. -
MAX_ASCII
private static final int MAX_ASCIIThe maximal US-ASCII value, inclusive.- See Also:
-
buffer
private byte[] bufferA buffer for reading a word from the XML document, assumed using US-ASCII characters. -
length
private int lengthNumber of valid characters inbuffer
string. -
insufficientBytes
boolean insufficientBytesSets totrue
whenread()
implementations reached theByteBuffer
limit, but the buffer has enough capacity for more bytes. In such case theprobeContent()
method will returnProbeResult.INSUFFICIENT_BYTES
, which means that the method requests more bytes for detecting the MIME type.- See Also:
-
-
Constructor Details
-
MimeTypeDetector
Creates a new instance.- Parameters:
mimeForNameSpaces
- the mapping from XML namespaces to MIME type.mimeForRootElements
- the mapping from root elements to MIME types, used only as a fallback.
-
-
Method Details
-
current
Returns the currentbuffer
content as a US-ASCII string.- Throws:
UnsupportedEncodingException
-
remember
private void remember(int c) Adds the given byte in thebuffer
, increasing its capacity if needed. -
read
Reads a single byte or character, or -1 if we reached the end of the stream portion that we are allowed to read. We are typically not allowed to read the full stream because only a limited amount of bytes is cached. This method may return a Unicode code point (i.e. the returned value may not fit inchar
).- Returns:
- the character, or -1 on end of stream window.
- Throws:
IOException
- if an error occurred while reading the byte or character.
-
readAfter
Skips all bytes or characters up tosearch
, then returns the character after it. Characters inside quotes will be ignored.- Parameters:
search
- the byte or character to skip.- Returns:
- the byte or character after
search
, or -1 on end of stream window. - Throws:
IOException
- if an error occurred while reading the bytes or characters.
-
afterSpaces
If the given character is a space, skips it and all following spaces. Returns the first non-space character.For the purpose of this method, a "space" is considered to be the
' '
character and all control characters (character below 32, which include tabulations and line feeds). This is the same criterion thanString.trim()
, but does not include Unicode spaces.- Returns:
- the first non-space character, or -1 on end of stream window.
- Throws:
IOException
- if an error occurred while reading the bytes or characters.
-
matches
Skips the spaces if any, then the given characters, then the spaces, then the given separator. After this method class, the stream position is on the first character after the separator if a match has been found, or after the first unknown character otherwise.- Parameters:
word
- the word to search, as US-ASCII characters.n
- number of valid characters inword
.c
- value ofafterSpaces(read())
.separator
- the':'
or'='
character.- Returns:
- 1 if a match is found, 0 if no match, or -1 on end of stream window.
- Throws:
IOException
- if an error occurred while reading the bytes or characters.
-
getMimeType
Returns the MIME type, ornull
if unknown. The call shall have already skipped the"<?xml "
characters before to invoke this method.- Throws:
IOException
- if an error occurred while reading the bytes or characters.
-
probeContent
- Throws:
DataStoreException
-