- Type Parameters:
V
- graph vertex typeE
- graph edge type
The algorithm is originally described in Zhang, Kaizhong & Shasha, Dennis. (1989). Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput.. 18. 1245-1262. 10.1137/0218082.
The time complexity of the algorithm is $O(|T_1|\cdot|T_2|\cdot min(depth(T_1),leaves(T_1)) \cdot min(depth(T_2),leaves(T_2)))$. Space complexity is $O(|T_1|\cdot |T_2|)$, where $|T_1|$ and $|T_2|$ denote number of vertices in trees $T_1$ and $T_2$ correspondingly, $leaves()$ function returns number of leaf vertices in a tree.
The tree edit distance problem is defined in a following way. Consider $2$ trees $T_1$ and $T_2$ with root vertices $r_1$ and $r_2$ correspondingly. For those trees there are 3 elementary modification actions:
- Remove a vertex $v$ from $T_1$.
- Insert a vertex $v$ into $T_2$.
- Change vertex $v_1$ in $T_1$ to vertex $v_2$ in $T_2$.
The algorithm is based on a dynamic programming principle and assigns a label to each vertex in the trees which is equal to its index in post-oder traversal. It also uses a notion of a keyroot which is defined as a vertex in a tree which has a left sibling. Additionally a special $l()$ function is introduced with returns for every vertex the index of its leftmost child wrt the post-order traversal in the tree.
Solving the tree edit problem distance is divided into computing edit distance for every pair of subtrees rooted at vertices $v_1$ and $v_2$ where $v_1$ is a keyroot in the first tree and $v_2$ is a keyroot in the second tree.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate class
Auxiliary class which is used intreeDistance()
function to store intermediate edit operations during dynamic programming computation.static class
Represents elementary action which changes the structure of a tree.static enum
Type of an edit operation.private class
Auxiliary class which for computes keyroot vertices, tree ordering and $l()$ function for a particular tree. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
Helper field which indicates whether the algorithm has already been executed fortree1
andtree2
.private ToDoubleBiFunction
<V, V> Function which computes cost of changing a vertex $v1$ intree1
to vertex $v2$ intree2
.private List
<List<List<ZhangShashaTreeEditDistance.EditOperation<V>>>> Array with lists of edit operations which transform subtrees oftree1
into subtreestree2
.private ToDoubleFunction
<V> Function which computes cost of inserting a particular vertex intotree2
.private ToDoubleFunction
<V> Function which computes cost of removing a particular vertex from {2code tree1}.private V
Root vertex of thetree1
.private V
Root vertex of thetree2
.First tree for which the distance is computed by this algorithm.Second tree for which the distance is computed by this algorithm.private double[][]
Array with edit distances between subtrees oftree1
andtree2
. -
Constructor Summary
ConstructorsConstructorDescriptionConstructs an instance of the algorithm for the giventree1
,root1
,tree2
androot2
.ZhangShashaTreeEditDistance
(Graph<V, E> tree1, V root1, Graph<V, E> tree2, V root2, ToDoubleFunction<V> insertCost, ToDoubleFunction<V> removeCost, ToDoubleBiFunction<V, V> changeCost) Constructs an instance of the algorithm for the giventree1
,root1
,tree2
,root2
,insertCost
,removeCost
andchangeCost
. -
Method Summary
Modifier and TypeMethodDescriptiondouble
Computes edit distance fortree1
andtree2
.Computes a list of edit operations which transformtree1
intotree2
.private void
Performs lazy computations of this algorithm and stores cached data intreeDistances
andeditOperationList
.private List
<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList
(List<List<ZhangShashaTreeEditDistance<V, E>.CacheEntry>> cachedOperations, int i, int j) Restores list of edit operations which have been cached incachedOperations
during the edit distance computation.private void
treeDistance
(int i, int j, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering1, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering2) Computes edit distance and list of edit operations for vertex $v1$ fromtree1
which has tree ordering index equal to $i$ and vertex $v2$ fromtree2
which has tree ordering index equal to $j$.
-
Field Details
-
tree1
First tree for which the distance is computed by this algorithm. -
root1
Root vertex of thetree1
. -
tree2
Second tree for which the distance is computed by this algorithm. -
root2
Root vertex of thetree2
. -
insertCost
Function which computes cost of inserting a particular vertex intotree2
. -
removeCost
Function which computes cost of removing a particular vertex from {2code tree1}. -
changeCost
Function which computes cost of changing a vertex $v1$ intree1
to vertex $v2$ intree2
. -
treeDistances
private double[][] treeDistancesArray with edit distances between subtrees oftree1
andtree2
. Formally, $treeDistances[i][j]$ stores edit distance between subtree oftree1
rooted at vertex $i+1$ and subtree oftree2
rooted at vertex $j+1$, where $i$ and $j$ are vertex indices from the corresponding tree orderings. -
editOperationLists
Array with lists of edit operations which transform subtrees oftree1
into subtreestree2
. Formally, editOperationLists[i][j]$ stores a list of edit operations which transform subtreetree1
rooted at vertex $i$ into subtree oftree2
rooted at vertex $j$, where $i$ and $j$ are vertex indices from the corresponding tree orderings. -
algorithmExecuted
private boolean algorithmExecutedHelper field which indicates whether the algorithm has already been executed fortree1
andtree2
.
-
-
Constructor Details
-
ZhangShashaTreeEditDistance
Constructs an instance of the algorithm for the giventree1
,root1
,tree2
androot2
. This constructor sets following default values for the distance functions. TheinsertCost
andremoveCost
always return $1.0$, thechangeCost
return $0.0$ if vertices are equal and1.0
otherwise.- Parameters:
tree1
- a treeroot1
- root vertex oftree1
tree2
- a treeroot2
- root vertex oftree2
-
ZhangShashaTreeEditDistance
public ZhangShashaTreeEditDistance(Graph<V, E> tree1, V root1, Graph<V, E> tree2, V root2, ToDoubleFunction<V> insertCost, ToDoubleFunction<V> removeCost, ToDoubleBiFunction<V, V> changeCost) Constructs an instance of the algorithm for the giventree1
,root1
,tree2
,root2
,insertCost
,removeCost
andchangeCost
.- Parameters:
tree1
- a treeroot1
- root vertex oftree1
tree2
- a treeroot2
- root vertex oftree2
insertCost
- cost function for inserting a node intotree1
removeCost
- cost function for removing a node fromtree2
changeCost
- cost function of changing a node intree1
to a node intree2
-
-
Method Details
-
getDistance
public double getDistance()Computes edit distance fortree1
andtree2
.- Returns:
- edit distance between
tree1
andtree2
-
getEditOperationLists
Computes a list of edit operations which transformtree1
intotree2
.- Returns:
- list of edit operations
-
lazyRunAlgorithm
private void lazyRunAlgorithm()Performs lazy computations of this algorithm and stores cached data intreeDistances
andeditOperationList
. -
treeDistance
private void treeDistance(int i, int j, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering1, ZhangShashaTreeEditDistance<V, E>.TreeOrdering ordering2) Computes edit distance and list of edit operations for vertex $v1$ fromtree1
which has tree ordering index equal to $i$ and vertex $v2$ fromtree2
which has tree ordering index equal to $j$. Both $v1$ and $v2$ must be keyroots in the corresponding trees.- Parameters:
i
- ordering index of a keyroot intree1
j
- ordering index of a keywoot intree2
ordering1
- ordering oftree1
ordering2
- ordering oftree2
-
restoreOperationsList
private List<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList(List<List<ZhangShashaTreeEditDistance<V, E>.CacheEntry>> cachedOperations, int i, int j) Restores list of edit operations which have been cached incachedOperations
during the edit distance computation. Starting from a cache entry at index $(i,j)$.- Parameters:
cachedOperations
- 2-dimensional list with cached operationsi
- starting operation indexj
- starting operation index- Returns:
- list of edit operations
-