- java.lang.Object
-
- org.jgrapht.alg.similarity.ZhangShashaTreeEditDistance<V,E>
-
- Type Parameters:
V
- graph vertex typeE
- graph edge type
public class ZhangShashaTreeEditDistance<V,E> extends java.lang.Object
Dynamic programming algorithm for computing edit distance between trees.The algorithm is originally described in Zhang, Kaizhong & Shasha, Dennis. (1989). Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SIAM J. Comput.. 18. 1245-1262. 10.1137/0218082.
The time complexity of the algorithm is $O(|T_1|\cdot|T_2|\cdot min(depth(T_1),leaves(T_1)) \cdot min(depth(T_2),leaves(T_2)))$. Space complexity is $O(|T_1|\cdot |T_2|)$, where $|T_1|$ and $|T_2|$ denote number of vertices in trees $T_1$ and $T_2$ correspondingly, $leaves()$ function returns number of leaf vertices in a tree.
The tree edit distance problem is defined in a following way. Consider $2$ trees $T_1$ and $T_2$ with root vertices $r_1$ and $r_2$ correspondingly. For those trees there are 3 elementary modification actions:
- Remove a vertex $v$ from $T_1$.
- Insert a vertex $v$ into $T_2$.
- Change vertex $v_1$ in $T_1$ to vertex $v_2$ in $T_2$.
The algorithm is based on a dynamic programming principle and assigns a label to each vertex in the trees which is equal to its index in post-oder traversal. It also uses a notion of a keyroot which is defined as a vertex in a tree which has a left sibling. Additionally a special $l()$ function is introduced with returns for every vertex the index of its leftmost child wrt the post-order traversal in the tree.
Solving the tree edit problem distance is divided into computing edit distance for every pair of subtrees rooted at vertices $v_1$ and $v_2$ where $v_1$ is a keyroot in the first tree and $v_2$ is a keyroot in the second tree.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
ZhangShashaTreeEditDistance.CacheEntry
Auxiliary class which is used intreeDistance()
function to store intermediate edit operations during dynamic programming computation.static class
ZhangShashaTreeEditDistance.EditOperation<V>
Represents elementary action which changes the structure of a tree.static class
ZhangShashaTreeEditDistance.OperationType
Type of an edit operation.private class
ZhangShashaTreeEditDistance.TreeOrdering
Auxiliary class which for computes keyroot vertices, tree ordering and $l()$ function for a particular tree.
-
Field Summary
Fields Modifier and Type Field Description private boolean
algorithmExecuted
Helper field which indicates whether the algorithm has already been executed fortree1
andtree2
.private java.util.function.ToDoubleBiFunction<V,V>
changeCost
Function which computes cost of changing a vertex $v1$ intree1
to vertex $v2$ intree2
.private java.util.List<java.util.List<java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>>>
editOperationLists
Array with lists of edit operations which transform subtrees oftree1
into subtreestree2
.private java.util.function.ToDoubleFunction<V>
insertCost
Function which computes cost of inserting a particular vertex intotree2
.private java.util.function.ToDoubleFunction<V>
removeCost
Function which computes cost of removing a particular vertex from {2code tree1}.private V
root1
Root vertex of thetree1
.private V
root2
Root vertex of thetree2
.private Graph<V,E>
tree1
First tree for which the distance is computed by this algorithm.private Graph<V,E>
tree2
Second tree for which the distance is computed by this algorithm.private double[][]
treeDistances
Array with edit distances between subtrees oftree1
andtree2
.
-
Constructor Summary
Constructors Constructor Description ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2)
Constructs an instance of the algorithm for the giventree1
,root1
,tree2
androot2
.ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2, java.util.function.ToDoubleFunction<V> insertCost, java.util.function.ToDoubleFunction<V> removeCost, java.util.function.ToDoubleBiFunction<V,V> changeCost)
Constructs an instance of the algorithm for the giventree1
,root1
,tree2
,root2
,insertCost
,removeCost
andchangeCost
.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
getDistance()
Computes edit distance fortree1
andtree2
.java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>
getEditOperationLists()
Computes a list of edit operations which transformtree1
intotree2
.private void
lazyRunAlgorithm()
Performs lazy computations of this algorithm and stores cached data intreeDistances
andeditOperationList
.private java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>
restoreOperationsList(java.util.List<java.util.List<ZhangShashaTreeEditDistance.CacheEntry>> cachedOperations, int i, int j)
Restores list of edit operations which have been cached incachedOperations
during the edit distance computation.private void
treeDistance(int i, int j, ZhangShashaTreeEditDistance.TreeOrdering ordering1, ZhangShashaTreeEditDistance.TreeOrdering ordering2)
Computes edit distance and list of edit operations for vertex $v1$ fromtree1
which has tree ordering index equal to $i$ and vertex $v2$ fromtree2
which has tree ordering index equal to $j$.
-
-
-
Field Detail
-
root1
private V root1
Root vertex of thetree1
.
-
root2
private V root2
Root vertex of thetree2
.
-
insertCost
private java.util.function.ToDoubleFunction<V> insertCost
Function which computes cost of inserting a particular vertex intotree2
.
-
removeCost
private java.util.function.ToDoubleFunction<V> removeCost
Function which computes cost of removing a particular vertex from {2code tree1}.
-
changeCost
private java.util.function.ToDoubleBiFunction<V,V> changeCost
Function which computes cost of changing a vertex $v1$ intree1
to vertex $v2$ intree2
.
-
treeDistances
private double[][] treeDistances
Array with edit distances between subtrees oftree1
andtree2
. Formally, $treeDistances[i][j]$ stores edit distance between subtree oftree1
rooted at vertex $i+1$ and subtree oftree2
rooted at vertex $j+1$, where $i$ and $j$ are vertex indices from the corresponding tree orderings.
-
editOperationLists
private java.util.List<java.util.List<java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>>>> editOperationLists
Array with lists of edit operations which transform subtrees oftree1
into subtreestree2
. Formally, editOperationLists[i][j]$ stores a list of edit operations which transform subtreetree1
rooted at vertex $i$ into subtree oftree2
rooted at vertex $j$, where $i$ and $j$ are vertex indices from the corresponding tree orderings.
-
algorithmExecuted
private boolean algorithmExecuted
Helper field which indicates whether the algorithm has already been executed fortree1
andtree2
.
-
-
Constructor Detail
-
ZhangShashaTreeEditDistance
public ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2)
Constructs an instance of the algorithm for the giventree1
,root1
,tree2
androot2
. This constructor sets following default values for the distance functions. TheinsertCost
andremoveCost
always return $1.0$, thechangeCost
return $0.0$ if vertices are equal and1.0
otherwise.- Parameters:
tree1
- a treeroot1
- root vertex oftree1
tree2
- a treeroot2
- root vertex oftree2
-
ZhangShashaTreeEditDistance
public ZhangShashaTreeEditDistance(Graph<V,E> tree1, V root1, Graph<V,E> tree2, V root2, java.util.function.ToDoubleFunction<V> insertCost, java.util.function.ToDoubleFunction<V> removeCost, java.util.function.ToDoubleBiFunction<V,V> changeCost)
Constructs an instance of the algorithm for the giventree1
,root1
,tree2
,root2
,insertCost
,removeCost
andchangeCost
.- Parameters:
tree1
- a treeroot1
- root vertex oftree1
tree2
- a treeroot2
- root vertex oftree2
insertCost
- cost function for inserting a node intotree1
removeCost
- cost function for removing a node fromtree2
changeCost
- cost function of changing a node intree1
to a node intree2
-
-
Method Detail
-
getDistance
public double getDistance()
Computes edit distance fortree1
andtree2
.- Returns:
- edit distance between
tree1
andtree2
-
getEditOperationLists
public java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>> getEditOperationLists()
Computes a list of edit operations which transformtree1
intotree2
.- Returns:
- list of edit operations
-
lazyRunAlgorithm
private void lazyRunAlgorithm()
Performs lazy computations of this algorithm and stores cached data intreeDistances
andeditOperationList
.
-
treeDistance
private void treeDistance(int i, int j, ZhangShashaTreeEditDistance.TreeOrdering ordering1, ZhangShashaTreeEditDistance.TreeOrdering ordering2)
Computes edit distance and list of edit operations for vertex $v1$ fromtree1
which has tree ordering index equal to $i$ and vertex $v2$ fromtree2
which has tree ordering index equal to $j$. Both $v1$ and $v2$ must be keyroots in the corresponding trees.- Parameters:
i
- ordering index of a keyroot intree1
j
- ordering index of a keywoot intree2
ordering1
- ordering oftree1
ordering2
- ordering oftree2
-
restoreOperationsList
private java.util.List<ZhangShashaTreeEditDistance.EditOperation<V>> restoreOperationsList(java.util.List<java.util.List<ZhangShashaTreeEditDistance.CacheEntry>> cachedOperations, int i, int j)
Restores list of edit operations which have been cached incachedOperations
during the edit distance computation. Starting from a cache entry at index $(i,j)$.- Parameters:
cachedOperations
- 2-dimensional list with cached operationsi
- starting operation indexj
- starting operation index- Returns:
- list of edit operations
-
-