Class LCSubstringSolver.ConcurrentSuffixTreeImpl<V>

java.lang.Object
com.googlecode.concurrenttrees.radix.ConcurrentRadixTree<V>
com.googlecode.concurrenttrees.solver.LCSubstringSolver.ConcurrentSuffixTreeImpl<V>
All Implemented Interfaces:
PrettyPrintable, RadixTree<V>, Serializable
Enclosing class:
LCSubstringSolver

class LCSubstringSolver.ConcurrentSuffixTreeImpl<V> extends ConcurrentRadixTree<V>
  • Constructor Details

    • ConcurrentSuffixTreeImpl

      public ConcurrentSuffixTreeImpl(NodeFactory nodeFactory)
  • Method Details

    • acquireWriteLock

      protected void acquireWriteLock()
      Overrides:
      acquireWriteLock in class ConcurrentRadixTree<V>
    • releaseWriteLock

      protected void releaseWriteLock()
      Overrides:
      releaseWriteLock in class ConcurrentRadixTree<V>
    • lazyTraverseDescendants

      protected Iterable<ConcurrentRadixTree.NodeKeyPair> lazyTraverseDescendants(CharSequence startKey, Node startNode)
      Description copied from class: ConcurrentRadixTree
      Traverses the tree using depth-first, preordered traversal, starting at the given node, using lazy evaluation such that the next node is only determined when next() is called on the iterator returned. The traversal algorithm uses iteration instead of recursion to allow deep trees to be traversed without requiring large JVM stack sizes.

      Each node that is encountered is returned from the iterator along with a key associated with that node, in a NodeKeyPair object. The key will be prefixed by the given start key, and will be generated by appending to the start key the edges traversed along the path to that node from the start node.

      Overrides:
      lazyTraverseDescendants in class ConcurrentRadixTree<V>
      Parameters:
      startKey - The key which matches the given start node
      startNode - The start node
      Returns:
      An iterator which when iterated traverses the tree using depth-first, preordered traversal, starting at the given start node
    • getLongestCommonSubstring

      CharSequence getLongestCommonSubstring()
      The main algorithm to find the longest common substring.
      1. Traverses all nodes in the suffix tree
      2. For each node checks if the path from the root via edges to that node is longer than the longest common substring encountered so far (and so is a candidate)
      3. Calls helper method subTreeReferencesAllOriginalDocuments(CharSequence, Node), supplying the candidate node. That method returns true if nodes in the sub-tree descending from that node collectively references all of the original documents added to the solver
      4. If the nodes in the sub-tree do collectively reference all documents, then the path from root to the current node is a substring of all documents
      5. Updates the longest common substring encountered so far if the conditions above hold for the current node
      6. Continues traversing the tree until all nodes have been checked
      Implementation note: Method subTreeReferencesAllOriginalDocuments(CharSequence, Node) will stop traversal early if it finds all original documents early. This method currently does not apply a similar optimization, and will actually descend into and apply the same tests to branches which the helper method already indicated are dead-ends(!). Future work might be to use this knowledge, skip dead-end branches, but it would involve not using any of the traversal logic from superclasses and overriding it all here for this one use case.
      Returns:
      The longest common substring
    • subTreeReferencesAllOriginalDocuments

      boolean subTreeReferencesAllOriginalDocuments(CharSequence startKey, Node startNode)
      Returns true if the given node and its descendants collectively reference all original documents added to the solver.

      This method will traverse the entire sub-tree until it has encountered all of the original documents. If it encounters all of the original documents early, before exhausting all nodes, returns early.

      Parameters:
      startKey - The key associated with the start node (concatenation of edges from root leading to it)
      startNode - The root of the sub-tree to traverse
      Returns:
      True if the given node and its descendants collectively reference all original documents added to the solver, false if the sub-tree does not reference all documents added to the solver