Class DiffImpl


  • public class DiffImpl
    extends java.lang.Object
    This is an algorithm calculating the optimal edit to change the source graph into the target graph.

    It is based on the following two papers (both papers are from the same authors. The first one is newer, but the older one is more detailed in some aspects)

    Accelerating Graph Similarity Search via Efficient GED Computation (https://lijunchang.github.io/pdf/2022-ged-tkde.pdf)

    Efficient Graph Edit Distance Computation and Verification via Anchor-aware Lower Bound Estimation (https://arxiv.org/abs/1709.06810)

    The algorithm is a modified version of "AStar-BMao". It is adapted to directed graphs as a GraphQL schema is most naturally represented as directed graph (vs the undirected graphs used in the papers).

    • Method Detail

      • diffImpl

        DiffImpl.OptimalEdit diffImpl​(Mapping startMapping,
                                      java.util.List<Vertex> allSources,
                                      java.util.List<Vertex> allTargets,
                                      java.util.concurrent.atomic.AtomicInteger algoIterationCount)
                               throws java.lang.Exception
        Throws:
        java.lang.Exception
      • calculateRestOfChildren

        private void calculateRestOfChildren​(java.util.List<Vertex> availableTargetVertices,
                                             HungarianAlgorithm hungarianAlgorithm,
                                             double[][] costMatrixCopy,
                                             double editorialCostForMapping,
                                             Mapping partialMapping,
                                             Vertex v_i,
                                             int upperBound,
                                             int level,
                                             java.util.concurrent.LinkedBlockingQueue<DiffImpl.MappingEntry> siblings)
      • expandMappingAndUpdateOptimalMapping

        private void expandMappingAndUpdateOptimalMapping​(int fixedEditorialCost,
                                                          int level,
                                                          DiffImpl.OptimalEdit optimalEdit,
                                                          java.util.List<Vertex> allSources,
                                                          Mapping toExpand,
                                                          int[] assignments,
                                                          java.util.List<Vertex> availableTargetVertices,
                                                          double lowerBoundCost)
        Extend the partial mapping to a full mapping according to the optimal matching (hungarian algo result) and update the optimal edit if we found a better one.
      • getCostMatrixSum

        private double getCostMatrixSum​(double[][] costMatrix,
                                        int[] assignments)
      • calcLowerBoundMappingCost

        private double calcLowerBoundMappingCost​(Vertex v,
                                                 Vertex u,
                                                 Mapping partialMapping,
                                                 java.util.Map<Vertex,​java.lang.Double> isolatedVerticesCache,
                                                 java.util.Map<Vertex,​Vertex> nonFixedParentRestrictions)
        lower bound mapping cost between for v -> u in respect to a partial mapping. It basically tells the minimal costs we can expect for all mappings that come from extending the partial mapping with v -> u.

        This is basically the formula (5) from page 6 of https://lijunchang.github.io/pdf/2022-ged-tkde.pdf.

        The main difference is that the formula works with undirected graphs, but we have a directed graph, hence there is no 1/2 factor and for comparing the labels of anchored vertices to v/u we need to take both directions into account.

        The other optimization is that a schema graph will have never a lot of adjacent edges compared to the overall vertices count, therefore the algorithm for the anchored vertices costs iterates over the adjacent edges of v/u instead of all the mapped vertices.

        Additionally, there is a shortcut for isolated vertices, representing deletion/insertion which is also cached.

        Some naming: an anchored vertex is a vertex that is mapped via the partial mapping. An inner edge is an edge between two vertices that are both not anchored (mapped). The vertices v and u are by definition not mapped.

      • calcAnchoredVerticesCost

        private int calcAnchoredVerticesCost​(Vertex v,
                                             Vertex u,
                                             Mapping partialMapping)
      • calcLowerBoundMappingCostForIsolated

        private double calcLowerBoundMappingCostForIsolated​(Vertex vertex,
                                                            Mapping partialMapping,
                                                            boolean sourceOrTarget)
        Simplified lower bound calc if the source/target vertex is isolated