Package org.languagetool.tools
Class StringTools
- java.lang.Object
-
- org.languagetool.tools.StringTools
-
public final class StringTools extends java.lang.Object
Tools for working with strings.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
StringTools.ApiPrintMode
Constants for printing XML rule matches.
-
Field Summary
Fields Modifier and Type Field Description static java.util.Set<java.lang.String>
LOWERCASE_GREEK_LETTERS
static java.util.Set<java.lang.String>
UPPERCASE_GREEK_LETTERS
private static java.util.regex.Pattern
XML_COMMENT_PATTERN
private static java.util.regex.Pattern
XML_PATTERN
-
Constructor Summary
Constructors Modifier Constructor Description private
StringTools()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
addSpace(java.lang.String word, Language language)
Adds spaces before words that are not punctuation.static @Nullable java.lang.String
asString(java.lang.CharSequence s)
static void
assureSet(java.lang.String s, java.lang.String varName)
Throw exception if the given string is null or empty or only whitespace.private static @Nullable java.lang.String
changeFirstCharCase(java.lang.String str, boolean toUpperCase)
Returnstr
modified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase
.static java.lang.String
escapeForXmlAttribute(java.lang.String s)
static java.lang.String
escapeForXmlContent(java.lang.String s)
static java.lang.String
escapeHTML(java.lang.String s)
Escapes these characters: less than, greater than, quote, ampersand.static java.lang.String
escapeXML(java.lang.String s)
CallsescapeHTML(String)
.static java.lang.String
filterXML(java.lang.String str)
Simple XML filtering for XML tags.static boolean
isAllUppercase(java.lang.String str)
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).static boolean
isCapitalizedWord(java.lang.String str)
static boolean
isEmpty(java.lang.String str)
Helper method to replace calls to"".equals()
.static boolean
isMixedCase(java.lang.String str)
Returns true if the given string is mixed case, likeMixedCase
ormixedCase
(but notMixedcase
).static boolean
isNonBreakingWhitespace(java.lang.String str)
Checks if a string is the non-breaking whitespace (static boolean
isNotAllLowercase(java.lang.String str)
Returns true ifstr
is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).static boolean
isParagraphEnd(java.lang.String sentence, boolean singleLineBreaksMarksPara)
static boolean
isPositiveNumber(char ch)
static boolean
isWhitespace(java.lang.String str)
Checks if a string contains a whitespace, including: all Unicode whitespace the non-breaking space (U+00A0) the narrow non-breaking space (U+202F) the zero width space (U+200B), used in Khmerstatic java.util.List<java.lang.String>
loadLines(java.lang.String path)
Loads file, ignoring comments (lines starting with#
).static @Nullable java.lang.String
lowercaseFirstChar(java.lang.String str)
Returnstr
modified so that its first character is now an lowercase character.static java.lang.String
readerToString(java.io.Reader reader)
static java.lang.String
readStream(java.io.InputStream stream, java.lang.String encoding)
Read the text stream using the given encoding.static boolean
startsWithUppercase(java.lang.String str)
Whether the first character ofstr
is an uppercase character.static java.lang.String
streamToString(java.io.InputStream is, java.lang.String charsetName)
static java.lang.String
trimSpecialCharacters(java.lang.String s)
eliminate special (unicode) characters, e.g.static java.lang.String
trimWhitespace(java.lang.String s)
Filters any whitespace characters.static @Nullable java.lang.String
uppercaseFirstChar(java.lang.String str)
Returnstr
modified so that its first character is now an uppercase character.static @Nullable java.lang.String
uppercaseFirstChar(java.lang.String str, Language language)
LikeuppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in e.g.
-
-
-
Field Detail
-
XML_COMMENT_PATTERN
private static final java.util.regex.Pattern XML_COMMENT_PATTERN
-
XML_PATTERN
private static final java.util.regex.Pattern XML_PATTERN
-
UPPERCASE_GREEK_LETTERS
public static final java.util.Set<java.lang.String> UPPERCASE_GREEK_LETTERS
-
LOWERCASE_GREEK_LETTERS
public static final java.util.Set<java.lang.String> LOWERCASE_GREEK_LETTERS
-
-
Method Detail
-
assureSet
public static void assureSet(java.lang.String s, java.lang.String varName)
Throw exception if the given string is null or empty or only whitespace.
-
readStream
public static java.lang.String readStream(java.io.InputStream stream, java.lang.String encoding) throws java.io.IOException
Read the text stream using the given encoding.- Parameters:
stream
- InputStream the stream to be readencoding
- the stream's character encoding, e.g.utf-8
, ornull
to use the system encoding- Returns:
- a string with the stream's content, lines separated by
\n
(note that\n
will be added to the last line even if it is not in the stream) - Throws:
java.io.IOException
- Since:
- 2.3
-
isAllUppercase
public static boolean isAllUppercase(java.lang.String str)
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).
-
isMixedCase
public static boolean isMixedCase(java.lang.String str)
Returns true if the given string is mixed case, likeMixedCase
ormixedCase
(but notMixedcase
).- Parameters:
str
- input str
-
isNotAllLowercase
public static boolean isNotAllLowercase(java.lang.String str)
Returns true ifstr
is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).- Since:
- 2.5
-
isCapitalizedWord
public static boolean isCapitalizedWord(java.lang.String str)
- Parameters:
str
- input string- Returns:
- true if word starts with an uppercase letter and all other letters are lowercase
-
startsWithUppercase
public static boolean startsWithUppercase(java.lang.String str)
Whether the first character ofstr
is an uppercase character.
-
uppercaseFirstChar
@Nullable public static @Nullable java.lang.String uppercaseFirstChar(java.lang.String str)
Returnstr
modified so that its first character is now an uppercase character. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
-
uppercaseFirstChar
@Nullable public static @Nullable java.lang.String uppercaseFirstChar(java.lang.String str, Language language)
LikeuppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").- Parameters:
language
- the language, will be ignored if it'snull
- Since:
- 2.7
-
lowercaseFirstChar
@Nullable public static @Nullable java.lang.String lowercaseFirstChar(java.lang.String str)
Returnstr
modified so that its first character is now an lowercase character. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
-
changeFirstCharCase
@Nullable private static @Nullable java.lang.String changeFirstCharCase(java.lang.String str, boolean toUpperCase)
Returnstr
modified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase
. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character.
-
readerToString
public static java.lang.String readerToString(java.io.Reader reader) throws java.io.IOException
- Throws:
java.io.IOException
-
streamToString
public static java.lang.String streamToString(java.io.InputStream is, java.lang.String charsetName) throws java.io.IOException
- Throws:
java.io.IOException
-
escapeXML
public static java.lang.String escapeXML(java.lang.String s)
CallsescapeHTML(String)
.
-
escapeForXmlAttribute
public static java.lang.String escapeForXmlAttribute(java.lang.String s)
- Since:
- 2.9
-
escapeForXmlContent
public static java.lang.String escapeForXmlContent(java.lang.String s)
- Since:
- 2.9
-
escapeHTML
public static java.lang.String escapeHTML(java.lang.String s)
Escapes these characters: less than, greater than, quote, ampersand.
-
trimWhitespace
public static java.lang.String trimWhitespace(java.lang.String s)
Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).- Parameters:
s
- String to be filtered.- Returns:
- Filtered s.
-
trimSpecialCharacters
public static java.lang.String trimSpecialCharacters(java.lang.String s)
eliminate special (unicode) characters, e.g. soft hyphens- Parameters:
s
- String to filter- Returns:
- s, with non-(alphanumeric, punctuation, space) characters deleted
- Since:
- 4.3
-
addSpace
public static java.lang.String addSpace(java.lang.String word, Language language)
Adds spaces before words that are not punctuation.- Parameters:
word
- Word to add the preceding space.language
- Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.- Returns:
- String containing a space or an empty string.
-
isWhitespace
public static boolean isWhitespace(java.lang.String str)
Checks if a string contains a whitespace, including:- all Unicode whitespace
- the non-breaking space (U+00A0)
- the narrow non-breaking space (U+202F)
- the zero width space (U+200B), used in Khmer
- Parameters:
str
- String to check- Returns:
- true if the string is a whitespace character
-
isNonBreakingWhitespace
public static boolean isNonBreakingWhitespace(java.lang.String str)
Checks if a string is the non-breaking whitespace (- Since:
- 2.1
-
isPositiveNumber
public static boolean isPositiveNumber(char ch)
- Parameters:
ch
- Character to check- Returns:
- True if the character is a positive number (decimal digit from 1 to 9).
-
isEmpty
public static boolean isEmpty(java.lang.String str)
Helper method to replace calls to"".equals()
.- Parameters:
str
- String to check- Returns:
- true if string is empty or
null
-
filterXML
public static java.lang.String filterXML(java.lang.String str)
Simple XML filtering for XML tags.- Parameters:
str
- XML string to be filtered.- Returns:
- Filtered string without XML tags.
-
asString
@Nullable public static @Nullable java.lang.String asString(java.lang.CharSequence s)
-
isParagraphEnd
public static boolean isParagraphEnd(java.lang.String sentence, boolean singleLineBreaksMarksPara)
- Since:
- 4.3
-
loadLines
public static java.util.List<java.lang.String> loadLines(java.lang.String path)
Loads file, ignoring comments (lines starting with#
).- Parameters:
path
- path in resource dir- Since:
- 4.6
-
-