Class ZhangShashaTreeEditDistance<V,​E>

  • Type Parameters:
    V - graph vertex type
    E - graph edge type

    public class ZhangShashaTreeEditDistance<V,​E>
    extends java.lang.Object
    Dynamic programming algorithm for computing edit distance between trees.

    The algorithm is originally described in Zhang, Kaizhong & Shasha, Dennis. (1989). Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput.. 18. 1245-1262. 10.1137/0218082.

    The time complexity of the algorithm is $O(|T_1|\cdot|T_2|\cdot min(depth(T_1),leaves(T_1)) \cdot min(depth(T_2),leaves(T_2)))$. Space complexity is $O(|T_1|\cdot |T_2|)$, where $|T_1|$ and $|T_2|$ denote number of vertices in trees $T_1$ and $T_2$ correspondingly, $leaves()$ function returns number of leaf vertices in a tree.

    The tree edit distance problem is defined in a following way. Consider $2$ trees $T_1$ and $T_2$ with root vertices $r_1$ and $r_2$ correspondingly. For those trees there are 3 elementary modification actions:

    • Remove a vertex $v$ from $T_1$.
    • Insert a vertex $v$ into $T_2$.
    • Change vertex $v_1$ in $T_1$ to vertex $v_2$ in $T_2$.
    The algorithm assigns a cost to each of those operations which also depends on the vertices. The problem is then to compute a sequence of edit operations which transforms $T_1$ into $T_2$ and has a minimum cost over all such sequences. Here the cost of a sequence of edit operations is defined as sum of costs of individual operations.

    The algorithm is based on a dynamic programming principle and assigns a label to each vertex in the trees which is equal to its index in post-oder traversal. It also uses a notion of a keyroot which is defined as a vertex in a tree which has a left sibling. Additionally a special $l()$ function is introduced with returns for every vertex the index of its leftmost child wrt the post-order traversal in the tree.

    Solving the tree edit problem distance is divided into computing edit distance for every pair of subtrees rooted at vertices $v_1$ and $v_2$ where $v_1$ is a keyroot in the first tree and $v_2$ is a keyroot in the second tree.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private boolean algorithmExecuted
      Helper field which indicates whether the algorithm has already been executed for tree1 and tree2.
      private java.util.function.ToDoubleBiFunction<V,​V> changeCost
      Function which computes cost of changing a vertex $v1$ in tree1 to vertex $v2$ in tree2.
      private java.util.List<java.util.List<java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>>> editOperationLists
      Array with lists of edit operations which transform subtrees of tree1 into subtrees tree2.
      private java.util.function.ToDoubleFunction<V> insertCost
      Function which computes cost of inserting a particular vertex into tree2.
      private java.util.function.ToDoubleFunction<V> removeCost
      Function which computes cost of removing a particular vertex from {2code tree1}.
      private V root1
      Root vertex of the tree1.
      private V root2
      Root vertex of the tree2.
      private Graph<V,​E> tree1
      First tree for which the distance is computed by this algorithm.
      private Graph<V,​E> tree2
      Second tree for which the distance is computed by this algorithm.
      private double[][] treeDistances
      Array with edit distances between subtrees of tree1 and tree2.
    • Constructor Summary

      Constructors 
      Constructor Description
      ZhangShashaTreeEditDistance​(Graph<V,​E> tree1, V root1, Graph<V,​E> tree2, V root2)
      Constructs an instance of the algorithm for the given tree1, root1, tree2 and root2.
      ZhangShashaTreeEditDistance​(Graph<V,​E> tree1, V root1, Graph<V,​E> tree2, V root2, java.util.function.ToDoubleFunction<V> insertCost, java.util.function.ToDoubleFunction<V> removeCost, java.util.function.ToDoubleBiFunction<V,​V> changeCost)
      Constructs an instance of the algorithm for the given tree1, root1, tree2, root2, insertCost, removeCost and changeCost.
    • Field Detail

      • tree1

        private Graph<V,​E> tree1
        First tree for which the distance is computed by this algorithm.
      • root1

        private V root1
        Root vertex of the tree1.
      • tree2

        private Graph<V,​E> tree2
        Second tree for which the distance is computed by this algorithm.
      • root2

        private V root2
        Root vertex of the tree2.
      • insertCost

        private java.util.function.ToDoubleFunction<V> insertCost
        Function which computes cost of inserting a particular vertex into tree2.
      • removeCost

        private java.util.function.ToDoubleFunction<V> removeCost
        Function which computes cost of removing a particular vertex from {2code tree1}.
      • changeCost

        private java.util.function.ToDoubleBiFunction<V,​V> changeCost
        Function which computes cost of changing a vertex $v1$ in tree1 to vertex $v2$ in tree2.
      • treeDistances

        private double[][] treeDistances
        Array with edit distances between subtrees of tree1 and tree2. Formally, $treeDistances[i][j]$ stores edit distance between subtree of tree1 rooted at vertex $i+1$ and subtree of tree2 rooted at vertex $j+1$, where $i$ and $j$ are vertex indices from the corresponding tree orderings.
      • editOperationLists

        private java.util.List<java.util.List<java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>>> editOperationLists
        Array with lists of edit operations which transform subtrees of tree1 into subtrees tree2. Formally, editOperationLists[i][j]$ stores a list of edit operations which transform subtree tree1 rooted at vertex $i$ into subtree of tree2 rooted at vertex $j$, where $i$ and $j$ are vertex indices from the corresponding tree orderings.
      • algorithmExecuted

        private boolean algorithmExecuted
        Helper field which indicates whether the algorithm has already been executed for tree1 and tree2.
    • Constructor Detail

      • ZhangShashaTreeEditDistance

        public ZhangShashaTreeEditDistance​(Graph<V,​E> tree1,
                                           V root1,
                                           Graph<V,​E> tree2,
                                           V root2)
        Constructs an instance of the algorithm for the given tree1, root1, tree2 and root2. This constructor sets following default values for the distance functions. The insertCost and removeCost always return $1.0$, the changeCost return $0.0$ if vertices are equal and 1.0 otherwise.
        Parameters:
        tree1 - a tree
        root1 - root vertex of tree1
        tree2 - a tree
        root2 - root vertex of tree2
      • ZhangShashaTreeEditDistance

        public ZhangShashaTreeEditDistance​(Graph<V,​E> tree1,
                                           V root1,
                                           Graph<V,​E> tree2,
                                           V root2,
                                           java.util.function.ToDoubleFunction<V> insertCost,
                                           java.util.function.ToDoubleFunction<V> removeCost,
                                           java.util.function.ToDoubleBiFunction<V,​V> changeCost)
        Constructs an instance of the algorithm for the given tree1, root1, tree2, root2, insertCost, removeCost and changeCost.
        Parameters:
        tree1 - a tree
        root1 - root vertex of tree1
        tree2 - a tree
        root2 - root vertex of tree2
        insertCost - cost function for inserting a node into tree1
        removeCost - cost function for removing a node from tree2
        changeCost - cost function of changing a node in tree1 to a node in tree2
    • Method Detail

      • getDistance

        public double getDistance()
        Computes edit distance for tree1 and tree2.
        Returns:
        edit distance between tree1 and tree2
      • getEditOperationLists

        public java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>> getEditOperationLists()
        Computes a list of edit operations which transform tree1 into tree2.
        Returns:
        list of edit operations
      • lazyRunAlgorithm

        private void lazyRunAlgorithm()
        Performs lazy computations of this algorithm and stores cached data in treeDistances and editOperationList.
      • treeDistance

        private void treeDistance​(int i,
                                  int j,
                                  ZhangShashaTreeEditDistance.TreeOrdering ordering1,
                                  ZhangShashaTreeEditDistance.TreeOrdering ordering2)
        Computes edit distance and list of edit operations for vertex $v1$ from tree1 which has tree ordering index equal to $i$ and vertex $v2$ from tree2 which has tree ordering index equal to $j$. Both $v1$ and $v2$ must be keyroots in the corresponding trees.
        Parameters:
        i - ordering index of a keyroot in tree1
        j - ordering index of a keywoot in tree2
        ordering1 - ordering of tree1
        ordering2 - ordering of tree2
      • restoreOperationsList

        private java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList​(java.util.List<java.util.List<ZhangShashaTreeEditDistance.CacheEntry>> cachedOperations,
                                                                                                   int i,
                                                                                                   int j)
        Restores list of edit operations which have been cached in cachedOperations during the edit distance computation. Starting from a cache entry at index $(i,j)$.
        Parameters:
        cachedOperations - 2-dimensional list with cached operations
        i - starting operation index
        j - starting operation index
        Returns:
        list of edit operations