Class UnicodeSetSpanner

java.lang.Object
com.ibm.icu.text.UnicodeSetSpanner

public class UnicodeSetSpanner extends Object
A helper class used to count, replace, and trim CharSequences based on UnicodeSet matches. An instance is immutable (and thus thread-safe) iff the source UnicodeSet is frozen.

Note: The counting, deletion, and replacement depend on alternating a UnicodeSet.SpanCondition with its inverse. That is, the code spans, then spans for the inverse, then spans, and so on. For the inverse, the following mapping is used:

These are actually not complete inverses. However, the alternating works because there are no gaps. For example, with [a{ab}{bc}], you get the following behavior when scanning forward:
SIMPLExxx[ab]cyyy
CONTAINEDxxx[abc]yyy
NOT_CONTAINED[xxx]ab[cyyy]

So here is what happens when you alternate:

start|xxxabcyyy
NOT_CONTAINEDxxx|abcyyy
CONTAINEDxxxabc|yyy
NOT_CONTAINEDxxxabcyyy|

The entire string is traversed.

  • Constructor Details

    • UnicodeSetSpanner

      public UnicodeSetSpanner(UnicodeSet source)
      Create a spanner from a UnicodeSet. For speed and safety, the UnicodeSet should be frozen. However, this class can be used with a non-frozen version to avoid the cost of freezing.
      Parameters:
      source - the original UnicodeSet
  • Method Details

    • getUnicodeSet

      public UnicodeSet getUnicodeSet()
      Returns the UnicodeSet used for processing. It is frozen iff the original was.
      Returns:
      the construction set.
    • equals

      public boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • countIn

      public int countIn(CharSequence sequence)
      Returns the number of matching characters found in a character sequence, counting by CountMethod.MIN_ELEMENTS using SpanCondition.SIMPLE. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - the sequence to count characters in
      Returns:
      the count. Zero if there are none.
    • countIn

      public int countIn(CharSequence sequence, UnicodeSetSpanner.CountMethod countMethod)
      Returns the number of matching characters found in a character sequence, using SpanCondition.SIMPLE. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - the sequence to count characters in
      countMethod - whether to treat an entire span as a match, or individual elements as matches
      Returns:
      the count. Zero if there are none.
    • countIn

      public int countIn(CharSequence sequence, UnicodeSetSpanner.CountMethod countMethod, UnicodeSet.SpanCondition spanCondition)
      Returns the number of matching characters found in a character sequence. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - the sequence to count characters in
      countMethod - whether to treat an entire span as a match, or individual elements as matches
      spanCondition - the spanCondition to use. SIMPLE or CONTAINED means only count the elements in the span; NOT_CONTAINED is the reverse.
      WARNING: when a UnicodeSet contains strings, there may be unexpected behavior in edge cases.
      Returns:
      the count. Zero if there are none.
    • deleteFrom

      public String deleteFrom(CharSequence sequence)
      Delete all the matching spans in sequence, using SpanCondition.SIMPLE The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - charsequence to replace matching spans in.
      Returns:
      modified string.
    • deleteFrom

      public String deleteFrom(CharSequence sequence, UnicodeSet.SpanCondition spanCondition)
      Delete all matching spans in sequence, according to the spanCondition. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - charsequence to replace matching spans in.
      spanCondition - specify whether to modify the matching spans (CONTAINED or SIMPLE) or the non-matching (NOT_CONTAINED)
      Returns:
      modified string.
    • replaceFrom

      public String replaceFrom(CharSequence sequence, CharSequence replacement)
      Replace all matching spans in sequence by the replacement, counting by CountMethod.MIN_ELEMENTS using SpanCondition.SIMPLE. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - charsequence to replace matching spans in.
      replacement - replacement sequence. To delete, use ""
      Returns:
      modified string.
    • replaceFrom

      public String replaceFrom(CharSequence sequence, CharSequence replacement, UnicodeSetSpanner.CountMethod countMethod)
      Replace all matching spans in sequence by replacement, according to the CountMethod, using SpanCondition.SIMPLE. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - charsequence to replace matching spans in.
      replacement - replacement sequence. To delete, use ""
      countMethod - whether to treat an entire span as a match, or individual elements as matches
      Returns:
      modified string.
    • replaceFrom

      public String replaceFrom(CharSequence sequence, CharSequence replacement, UnicodeSetSpanner.CountMethod countMethod, UnicodeSet.SpanCondition spanCondition)
      Replace all matching spans in sequence by replacement, according to the countMethod and spanCondition. The code alternates spans; see the class doc for UnicodeSetSpanner for a note about boundary conditions.
      Parameters:
      sequence - charsequence to replace matching spans in.
      replacement - replacement sequence. To delete, use ""
      countMethod - whether to treat an entire span as a match, or individual elements as matches
      spanCondition - specify whether to modify the matching spans (CONTAINED or SIMPLE) or the non-matching (NOT_CONTAINED)
      Returns:
      modified string.
    • trim

      public CharSequence trim(CharSequence sequence)
      Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start and end of the string, using TrimOption.BOTH and SpanCondition.SIMPLE. For example:
       
       
         new UnicodeSet("[ab]").trim("abacatbab")
       
      ... returns "cat".
      Parameters:
      sequence - the sequence to trim
      Returns:
      a subsequence
    • trim

      public CharSequence trim(CharSequence sequence, UnicodeSetSpanner.TrimOption trimOption)
      Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start or end of the string, using the trimOption and SpanCondition.SIMPLE. For example:
       
       
         new UnicodeSet("[ab]").trim("abacatbab", TrimOption.LEADING)
       
      ... returns "catbab".
      Parameters:
      sequence - the sequence to trim
      trimOption - LEADING, TRAILING, or BOTH
      Returns:
      a subsequence
    • trim

      public CharSequence trim(CharSequence sequence, UnicodeSetSpanner.TrimOption trimOption, UnicodeSet.SpanCondition spanCondition)
      Returns a trimmed sequence (using CharSequence.subsequence()), that omits matching elements at the start or end of the string, depending on the trimOption and spanCondition. For example:
       
       
         new UnicodeSet("[ab]").trim("abacatbab", TrimOption.LEADING, SpanCondition.SIMPLE)
       
      ... returns "catbab".
      Parameters:
      sequence - the sequence to trim
      trimOption - LEADING, TRAILING, or BOTH
      spanCondition - SIMPLE, CONTAINED or NOT_CONTAINED
      Returns:
      a subsequence