Package org.htmlunit.cyberneko
Class HTMLNamedEntitiesParser
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLNamedEntitiesParser
-
public final class HTMLNamedEntitiesParser extends java.lang.Object
This is a very specialized class for recognizing HTML named entities with the ability to look them up in stages. It is stateless and hence memory friendly. Additionally, it is not generated code rather it sets itself up from a file at first use and stays fixed from now on. Technically, it is not a parser anymore, because it does not have a state that matches the HTML standard: 12.2.5.72 Character reference stateBecause it is stateless, it delegates the state handling to the user in the sense of how many characters one saw and when to stop doing things.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected static class
HTMLNamedEntitiesParser.RootState
This is our initial state and has a special optimization applied.static class
HTMLNamedEntitiesParser.State
Our "level" in the treeish structure that keeps its static state and the next level underneath.
-
Field Summary
Fields Modifier and Type Field Description private FastHashMap<java.lang.String,java.lang.String>
entities_
private static HTMLNamedEntitiesParser
instance
private HTMLNamedEntitiesParser.RootState
rootLevel_
-
Constructor Summary
Constructors Modifier Constructor Description private
HTMLNamedEntitiesParser()
Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static HTMLNamedEntitiesParser
get()
Returns the singleton.HTMLNamedEntitiesParser.State
lookup(int character, HTMLNamedEntitiesParser.State state)
Pseudo parses and entity character by character.HTMLNamedEntitiesParser.State
lookup(java.lang.String entityName)
Utility method, mostly for testing, that allows us to look up and entity from a string instead from single characters.java.lang.String
lookupEntityRefFor(java.lang.String key)
-
-
-
Field Detail
-
instance
private static final HTMLNamedEntitiesParser instance
-
rootLevel_
private HTMLNamedEntitiesParser.RootState rootLevel_
-
entities_
private FastHashMap<java.lang.String,java.lang.String> entities_
-
-
Method Detail
-
get
public static HTMLNamedEntitiesParser get()
Returns the singleton. The singleton is stateless and can safely be used in a multi-threaded context.- Returns:
- the singleton instance of the parser, can never be null
-
lookup
public HTMLNamedEntitiesParser.State lookup(java.lang.String entityName)
Utility method, mostly for testing, that allows us to look up and entity from a string instead from single characters.- Parameters:
entityName
- the entity to look up- Returns:
- a state that resembles the result, will never be null
-
lookup
public HTMLNamedEntitiesParser.State lookup(int character, HTMLNamedEntitiesParser.State state)
Pseudo parses and entity character by character. We assume that we get presented with the chars after the starting ampersand. This parser does not supported unicode entities, hence this has to be handled differently.- Parameters:
character
- the next character, should not be the ampersand everstate
- the last known state or null in case we start to parse- Returns:
- the current state, which might be a valid final result, see
HTMLNamedEntitiesParser.State
-
lookupEntityRefFor
public java.lang.String lookupEntityRefFor(java.lang.String key)
- Returns:
- the entity ref for the given key (usually a single char) or null
-
-