Class DiffImpl
It is based on the following two papers (both papers are from the same authors. The first one is newer, but the older one is more detailed in some aspects)
Accelerating Graph Similarity Search via Efficient GED Computation (https://lijunchang.github.io/pdf/2022-ged-tkde.pdf)
Efficient Graph Edit Distance Computation and Verification via Anchor-aware Lower Bound Estimation (https://arxiv.org/abs/1709.06810)
The algorithm is a modified version of "AStar-BMao". It is adapted to directed graphs as a GraphQL schema is most naturally represented as directed graph (vs the undirected graphs used in the papers).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
static class
An optimal edit from one graph to another. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final SchemaGraph
private final SchemaGraph
private final PossibleMappingsCalculator.PossibleMappings
private final PossibleMappingsCalculator
private final SchemaDiffingRunningCheck
-
Constructor Summary
ConstructorsConstructorDescriptionDiffImpl
(PossibleMappingsCalculator possibleMappingsCalculator, SchemaGraph completeSourceGraph, SchemaGraph completeTargetGraph, PossibleMappingsCalculator.PossibleMappings possibleMappings, SchemaDiffingRunningCheck runningCheck) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addChildToQueue
(int fixedEditorialCost, DiffImpl.MappingEntry parentEntry, PriorityQueue<DiffImpl.MappingEntry> queue, DiffImpl.OptimalEdit optimalEdit, List<Vertex> allSources, List<Vertex> allTargets) private void
addSiblingToQueue
(int fixedEditorialCost, int level, PriorityQueue<DiffImpl.MappingEntry> queue, DiffImpl.OptimalEdit optimalEdit, List<Vertex> allSources, List<Vertex> allTargets, DiffImpl.MappingEntry mappingEntry) private int
calcAnchoredVerticesCost
(Vertex v, Vertex u, Mapping partialMapping) private double
calcLowerBoundMappingCost
(Vertex v, Vertex u, Mapping partialMapping, Map<Vertex, Double> isolatedVerticesCache, Map<Vertex, Vertex> nonFixedParentRestrictions) lower bound mapping cost between for v -> u in respect to a partial mapping.private double
calcLowerBoundMappingCostForIsolated
(Vertex vertex, Mapping partialMapping, boolean sourceOrTarget) Simplified lower bound calc if the source/target vertex is isolatedprivate void
calculateRestOfChildren
(List<Vertex> availableTargetVertices, HungarianAlgorithm hungarianAlgorithm, double[][] costMatrixCopy, double editorialCostForMapping, Mapping partialMapping, Vertex v_i, int upperBound, int level, LinkedBlockingQueue<DiffImpl.MappingEntry> siblings) (package private) DiffImpl.OptimalEdit
diffImpl
(Mapping startMapping, List<Vertex> allSources, List<Vertex> allTargets, AtomicInteger algoIterationCount) private void
expandMappingAndUpdateOptimalMapping
(int fixedEditorialCost, int level, DiffImpl.OptimalEdit optimalEdit, List<Vertex> allSources, Mapping toExpand, int[] assignments, List<Vertex> availableTargetVertices, double lowerBoundCost) Extend the partial mapping to a full mapping according to the optimal matching (hungarian algo result) and update the optimal edit if we found a better one.private double
getCostMatrixSum
(double[][] costMatrix, int[] assignments) private void
updateOptimalEdit
(DiffImpl.OptimalEdit optimalEdit, int newGed, Mapping mapping)
-
Field Details
-
possibleMappingsCalculator
-
completeSourceGraph
-
completeTargetGraph
-
possibleMappings
-
runningCheck
-
-
Constructor Details
-
DiffImpl
public DiffImpl(PossibleMappingsCalculator possibleMappingsCalculator, SchemaGraph completeSourceGraph, SchemaGraph completeTargetGraph, PossibleMappingsCalculator.PossibleMappings possibleMappings, SchemaDiffingRunningCheck runningCheck)
-
-
Method Details
-
diffImpl
DiffImpl.OptimalEdit diffImpl(Mapping startMapping, List<Vertex> allSources, List<Vertex> allTargets, AtomicInteger algoIterationCount) throws Exception - Throws:
Exception
-
addChildToQueue
private void addChildToQueue(int fixedEditorialCost, DiffImpl.MappingEntry parentEntry, PriorityQueue<DiffImpl.MappingEntry> queue, DiffImpl.OptimalEdit optimalEdit, List<Vertex> allSources, List<Vertex> allTargets) -
updateOptimalEdit
-
calculateRestOfChildren
private void calculateRestOfChildren(List<Vertex> availableTargetVertices, HungarianAlgorithm hungarianAlgorithm, double[][] costMatrixCopy, double editorialCostForMapping, Mapping partialMapping, Vertex v_i, int upperBound, int level, LinkedBlockingQueue<DiffImpl.MappingEntry> siblings) -
addSiblingToQueue
private void addSiblingToQueue(int fixedEditorialCost, int level, PriorityQueue<DiffImpl.MappingEntry> queue, DiffImpl.OptimalEdit optimalEdit, List<Vertex> allSources, List<Vertex> allTargets, DiffImpl.MappingEntry mappingEntry) throws InterruptedException - Throws:
InterruptedException
-
expandMappingAndUpdateOptimalMapping
private void expandMappingAndUpdateOptimalMapping(int fixedEditorialCost, int level, DiffImpl.OptimalEdit optimalEdit, List<Vertex> allSources, Mapping toExpand, int[] assignments, List<Vertex> availableTargetVertices, double lowerBoundCost) Extend the partial mapping to a full mapping according to the optimal matching (hungarian algo result) and update the optimal edit if we found a better one. -
getCostMatrixSum
private double getCostMatrixSum(double[][] costMatrix, int[] assignments) -
calcLowerBoundMappingCost
private double calcLowerBoundMappingCost(Vertex v, Vertex u, Mapping partialMapping, Map<Vertex, Double> isolatedVerticesCache, Map<Vertex, Vertex> nonFixedParentRestrictions) lower bound mapping cost between for v -> u in respect to a partial mapping. It basically tells the minimal costs we can expect for all mappings that come from extending the partial mapping with v -> u.This is basically the formula (5) from page 6 of https://lijunchang.github.io/pdf/2022-ged-tkde.pdf.
The main difference is that the formula works with undirected graphs, but we have a directed graph, hence there is no 1/2 factor and for comparing the labels of anchored vertices to v/u we need to take both directions into account.
The other optimization is that a schema graph will have never a lot of adjacent edges compared to the overall vertices count, therefore the algorithm for the anchored vertices costs iterates over the adjacent edges of v/u instead of all the mapped vertices.
Additionally, there is a shortcut for isolated vertices, representing deletion/insertion which is also cached.
Some naming: an anchored vertex is a vertex that is mapped via the partial mapping. An inner edge is an edge between two vertices that are both not anchored (mapped). The vertices v and u are by definition not mapped.
-
calcAnchoredVerticesCost
-
calcLowerBoundMappingCostForIsolated
private double calcLowerBoundMappingCostForIsolated(Vertex vertex, Mapping partialMapping, boolean sourceOrTarget) Simplified lower bound calc if the source/target vertex is isolated
-