Class Transform


  • public class Transform
    extends java.lang.Object
    Static methods that manipulate immutable graphs.

    Most methods take an ImmutableGraph (along with some other data, that depend on the kind of transformation), and return another ImmutableGraph that represents the transformed version.

    • Field Detail

      • NO_LOOPS

        public static final it.unimi.dsi.big.webgraph.Transform.NoLoops NO_LOOPS
        A singleton providing an arc filter that rejects loops.
    • Method Detail

      • filterArcs

        public static ImmutableGraph filterArcs​(ImmutableGraph graph,
                                                Transform.ArcFilter filter,
                                                it.unimi.dsi.logging.ProgressLogger ignored)
        Returns a graph with some arcs eventually stripped, according to the given filter.
        Parameters:
        graph - a graph.
        filter - the filter (telling whether each arc should be kept or not).
        ignored - a progress logger, which will be ignored.
        Returns:
        the filtered graph.
      • filterArcs

        public static ArcLabelledImmutableGraph filterArcs​(ArcLabelledImmutableGraph graph,
                                                           Transform.LabelledArcFilter filter,
                                                           it.unimi.dsi.logging.ProgressLogger ignored)
        Returns a labelled graph with some arcs eventually stripped, according to the given filter.
        Parameters:
        graph - a labelled graph.
        filter - the filter (telling whether each arc should be kept or not).
        ignored - a progress logger, which will be ignored.
        Returns:
        the filtered graph.
      • filterArcs

        public static ImmutableGraph filterArcs​(ImmutableGraph graph,
                                                Transform.ArcFilter filter)
        Returns a graph with some arcs eventually stripped, according to the given filter.
        Parameters:
        graph - a graph.
        filter - the filter (telling whether each arc should be kept or not).
        Returns:
        the filtered graph.
      • symmetrizeOffline

        public static ImmutableGraph symmetrizeOffline​(ImmutableGraph g,
                                                       int batchSize)
                                                throws java.io.IOException
        Returns a symmetrized graph using an offline transposition.
        Parameters:
        g - the source graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Returns:
        the symmetrized graph.
        Throws:
        java.io.IOException
        See Also:
        symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
      • symmetrizeOffline

        public static ImmutableGraph symmetrizeOffline​(ImmutableGraph g,
                                                       int batchSize,
                                                       java.io.File tempDir)
                                                throws java.io.IOException
        Returns a symmetrized graph using an offline transposition.
        Parameters:
        g - the source graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Returns:
        the symmetrized graph.
        Throws:
        java.io.IOException
        See Also:
        symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
      • symmetrizeOffline

        public static ImmutableGraph symmetrizeOffline​(ImmutableGraph g,
                                                       int batchSize,
                                                       java.io.File tempDir,
                                                       it.unimi.dsi.logging.ProgressLogger pl)
                                                throws java.io.IOException
        Returns a symmetrized graph using an offline transposition.

        The symmetrized graph is the union of a graph and of its transpose. This method will compute the transpose on the fly using transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger).

        Parameters:
        g - the source graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger, or null.
        Returns:
        the symmetrized graph.
        Throws:
        java.io.IOException
      • simplify

        public static ImmutableGraph simplify​(ImmutableGraph g,
                                              ImmutableGraph t,
                                              it.unimi.dsi.logging.ProgressLogger pl)
        Returns a simplified (loopless and symmetric) graph using the graph and its transpose.
        Parameters:
        g - the source graph.
        t - the graph g transposed.
        pl - a progress logger, or null.
        Returns:
        the simplified (loopless and symmetric) graph.
      • simplify

        public static ImmutableGraph simplify​(ImmutableGraph g,
                                              ImmutableGraph t)
        Returns a simplified (loopless and symmetric) graph using the graph and its transpose.
        Parameters:
        g - the source graph.
        t - the graph g transposed.
        Returns:
        the simplified (loopless and symmetric) graph.
      • simplifyOffline

        public static ImmutableGraph simplifyOffline​(ImmutableGraph g,
                                                     int batchSize)
                                              throws java.io.IOException
        Returns a simplified (loopless and symmetric) graph using an offline transposition.
        Parameters:
        g - the source graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Returns:
        the simplified (loopless and symmetric) graph.
        Throws:
        java.io.IOException
        See Also:
        simplifyOffline(ImmutableGraph, int, File, ProgressLogger)
      • simplifyOffline

        public static ImmutableGraph simplifyOffline​(ImmutableGraph g,
                                                     int batchSize,
                                                     java.io.File tempDir)
                                              throws java.io.IOException
        Returns a simplified (loopless and symmetric) graph using an offline transposition.
        Parameters:
        g - the source graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Returns:
        the simplified (loopless and symmetric) graph.
        Throws:
        java.io.IOException
        See Also:
        simplifyOffline(ImmutableGraph, int, File, ProgressLogger)
      • simplifyOffline

        public static ImmutableGraph simplifyOffline​(ImmutableGraph g,
                                                     int batchSize,
                                                     java.io.File tempDir,
                                                     it.unimi.dsi.logging.ProgressLogger pl)
                                              throws java.io.IOException
        Returns a simplified graph(loopless and symmetric) using an offline transposition.

        The simplified graph is the union of a graph and of its transpose, with the loops removed. This method will compute the transpose on the fly using transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger).

        Parameters:
        g - the source graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger, or null.
        Returns:
        the simplified (loopless and symmetric) graph.
        Throws:
        java.io.IOException
      • processBatch

        public static int processBatch​(int n,
                                       long[] source,
                                       long[] target,
                                       java.io.File tempDir,
                                       java.util.List<java.io.File> batches)
                                throws java.io.IOException
        Sorts the given source and target arrays w.r.t. the target and stores them in a temporary file.
        Parameters:
        n - the index of the last element to be sorted (exclusive).
        source - the source array.
        target - the target array.
        tempDir - a temporary directory where to store the sorted arrays, or null
        batches - a list of files to which the batch file will be added.
        Returns:
        the number of pairs in the batch (might be less than n because duplicates are eliminated).
        Throws:
        java.io.IOException
      • transposeOffline

        public static ImmutableSequentialGraph transposeOffline​(ImmutableGraph g,
                                                                int batchSize)
                                                         throws java.io.IOException
        Returns an immutable graph obtained by reversing all arcs in g, using an offline method.
        Parameters:
        g - an immutable graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Returns:
        an immutable, sequentially accessible graph obtained by transposing g.
        Throws:
        java.io.IOException
        See Also:
        transposeOffline(ImmutableGraph, int, File, ProgressLogger)
      • transposeOffline

        public static ImmutableSequentialGraph transposeOffline​(ImmutableGraph g,
                                                                int batchSize,
                                                                java.io.File tempDir)
                                                         throws java.io.IOException
        Returns an immutable graph obtained by reversing all arcs in g, using an offline method.
        Parameters:
        g - an immutable graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Returns:
        an immutable, sequentially accessible graph obtained by transposing g.
        Throws:
        java.io.IOException
        See Also:
        transposeOffline(ImmutableGraph, int, File, ProgressLogger)
      • transposeOffline

        public static ImmutableSequentialGraph transposeOffline​(ImmutableGraph g,
                                                                int batchSize,
                                                                java.io.File tempDir,
                                                                it.unimi.dsi.logging.ProgressLogger pl)
                                                         throws java.io.IOException
        Returns an immutable graph obtained by reversing all arcs in g, using an offline method.

        This method creates a number of sorted batches on disk containing arcs represented by a pair of gap-compressed long integers ordered by target and returns an ImmutableGraph that can be accessed only using a node iterator. The node iterator merges on the fly the batches, providing a transposed graph. The files are marked with File.deleteOnExit(), so they should disappear when the JVM exits. An additional safety-net finaliser tries to delete the batches, too.

        Note that each NodeIterator returned by the transpose requires opening all batches at the same time. The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case, another safety-net finaliser closes all files when the iterator is collected.

        This method can process offline graphs.

        Parameters:
        g - an immutable graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger.
        Returns:
        an immutable, sequentially accessible graph obtained by transposing g.
        Throws:
        java.io.IOException
      • logBatches

        protected static void logBatches​(it.unimi.dsi.fastutil.objects.ObjectArrayList<java.io.File> batches,
                                         long pairs,
                                         it.unimi.dsi.logging.ProgressLogger pl)
      • mapOffline

        public static ImmutableSequentialGraph mapOffline​(ImmutableGraph g,
                                                          long[][] map,
                                                          int batchSize)
                                                   throws java.io.IOException
        Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via a big array.
        Parameters:
        g - an immutable graph.
        map - the transformation map.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Returns:
        an immutable, sequentially accessible graph obtained by transforming g.
        Throws:
        java.io.IOException
        See Also:
        mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)
      • mapOffline

        public static ImmutableSequentialGraph mapOffline​(ImmutableGraph g,
                                                          long[][] map,
                                                          int batchSize,
                                                          java.io.File tempDir)
                                                   throws java.io.IOException
        Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via a big array.
        Parameters:
        g - an immutable graph.
        map - the transformation map.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Returns:
        an immutable, sequentially accessible graph obtained by transforming g.
        Throws:
        java.io.IOException
        See Also:
        mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)
      • mapOffline

        public static ImmutableSequentialGraph mapOffline​(ImmutableGraph g,
                                                          long[][] map,
                                                          int batchSize,
                                                          java.io.File tempDir,
                                                          it.unimi.dsi.logging.ProgressLogger pl)
                                                   throws java.io.IOException
        Remaps the the graph nodes through a partial function specified via a big array, using an offline method.

        More specifically, LongBigArrays.length(map)=g.numNodes(), and LongBigArrays.get(map, i) is the new name of node i, or -1 if the node should not be mapped. If some index appearing in map is larger than or equal to the number of nodes of g, the resulting graph is enlarged correspondingly.

        Arcs are mapped in the obvious way; in other words, there is an arc from LongBigArrays.get(map, i) to LongBigArrays.get(map, j) (both nonnegative) in the transformed graph iff there was an arc from i to j in the original graph.

        Note that if map is bijective, the returned graph is simply a permutation of the original graph. Otherwise, the returned graph is obtained by deleting nodes mapped to -1, quotienting nodes w.r.t. the equivalence relation induced by the fibres of map and renumbering the result, always according to map. See transposeOffline(ImmutableGraph, int, File, ProgressLogger) for implementation and performance-related details.

        Parameters:
        g - an immutable graph.
        map - the transformation map.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger, or null.
        Returns:
        an immutable, sequentially accessible graph obtained by transforming g.
        Throws:
        java.io.IOException
      • transposeOffline

        public static ArcLabelledImmutableGraph transposeOffline​(ArcLabelledImmutableGraph g,
                                                                 int batchSize)
                                                          throws java.io.IOException
        Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.
        Parameters:
        g - an immutable graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
        Returns:
        an immutable, sequentially accessible graph obtained by transposing g.
        Throws:
        java.io.IOException
        See Also:
        transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
      • transposeOffline

        public static ArcLabelledImmutableGraph transposeOffline​(ArcLabelledImmutableGraph g,
                                                                 int batchSize,
                                                                 java.io.File tempDir)
                                                          throws java.io.IOException
        Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.
        Parameters:
        g - an immutable graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Returns:
        an immutable, sequentially accessible graph obtained by transposing g.
        Throws:
        java.io.IOException
        See Also:
        transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
      • transposeOffline

        public static ArcLabelledImmutableGraph transposeOffline​(ArcLabelledImmutableGraph g,
                                                                 int batchSize,
                                                                 java.io.File tempDir,
                                                                 it.unimi.dsi.logging.ProgressLogger pl)
                                                          throws java.io.IOException
        Returns an arc-labelled immutable graph obtained by reversing all arcs in g, using an offline method.

        This method creates a number of sorted batches on disk containing arcs represented by a pair of long integers in DataInput format ordered by target and returns an ImmutableGraph that can be accessed only using a node iterator. The node iterator merges on the fly the batches, providing a transposed graph. The files are marked with File.deleteOnExit(), so they should disappear when the JVM exits. An additional safety-net finaliser tries to delete the batches, too. As far as labels are concerned, they are temporarily stored in an in-memory bit stream, that is permuted when it is stored on the disk

        Note that each NodeIterator returned by the transpose requires opening all batches at the same time. The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case, another safety-net finaliser closes all files when the iterator is collected.

        This method can process offline graphs. Note that no method to transpose on-line arc-labelled graph is provided currently.

        Parameters:
        g - an immutable graph.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method, plus an additional FastByteArrayOutputStream needed to store all the labels for a batch.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger.
        Returns:
        an immutable, sequentially accessible graph obtained by transposing g.
        Throws:
        java.io.IOException
      • union

        public static ImmutableGraph union​(ImmutableGraph g0,
                                           ImmutableGraph g1)
        Returns the union of two immutable graphs.

        The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

        Parameters:
        g0 - the first graph.
        g1 - the second graph.
        Returns:
        the union of the two graphs.
      • compose

        public static ImmutableGraph compose​(ImmutableGraph g0,
                                             ImmutableGraph g1)
        Returns the composition (a.k.a. matrix product) of two immutable graphs.

        The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

        Parameters:
        g0 - the first graph.
        g1 - the second graph.
        Returns:
        the composition of the two graphs.
      • compose

        public static ArcLabelledImmutableGraph compose​(ArcLabelledImmutableGraph g0,
                                                        ArcLabelledImmutableGraph g1,
                                                        LabelSemiring strategy)
        Returns the composition (a.k.a. matrix product) of two arc-labelled immutable graphs.

        The two arguments may differ in the number of nodes, in which case the resulting graph will be large as the larger graph.

        Parameters:
        g0 - the first graph.
        g1 - the second graph.
        strategy - a label semiring.
        Returns:
        the composition of the two graphs.
        Implementation Specification:
        This implementation requires outdegrees smaller than 232.
      • grayCodePermutation

        public static long[][] grayCodePermutation​(ImmutableGraph g)
        Returns a permutation that would make the given graph adjacency lists in Gray-code order.

        Gray codes list all sequences of n zeros and ones in such a way that adjacent lists differ by exactly one bit. If we assign to each row of the adjacency matrix of a graph its index as a Gray code, we obtain a permutation that will make similar lines nearer.

        Note that since a graph permutation permutes both rows and columns, this transformation is not idempotent: the Gray-code permutation produced from a matrix that has been Gray-code sorted will not be, in general, the identity.

        The important feature of Gray-code ordering is that it is completely endogenous (e.g., determined by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge of the URL associated to each node).

        Parameters:
        g - an immutable graph.
        Returns:
        the permutation that would order the graph adjacency lists by Gray order (you can just pass it to mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)).
      • randomPermutation

        public static long[][] randomPermutation​(ImmutableGraph g,
                                                 long seed)
        Returns a random permutation for a given graph.
        Parameters:
        g - an immutable graph.
        seed - for XoRoShiRo128PlusRandom.
        Returns:
        a random permutation for the given graph
      • lexicographicalPermutation

        public static long[][] lexicographicalPermutation​(ImmutableGraph g)
        Returns a permutation that would make the given graph adjacency lists in lexicographical order.

        Note that since a graph permutation permutes both rows and columns, this transformation is not idempotent: the lexicographical permutation produced from a matrix that has been lexicographically sorted will not be, in general, the identity.

        The important feature of lexicographical ordering is that it is completely endogenous (e.g., determined by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge of the URL associated to each node).

        Warning: rows are numbered from zero from the left. This means, for instance, that nodes with an arc towards node zero are lexicographically smaller than nodes without it.

        Parameters:
        g - an immutable graph.
        Returns:
        the permutation that would order the graph adjacency lists by lexicographical order (you can just pass it to mapOffline(ImmutableGraph, long[][], int)).
      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException,
                                java.lang.IllegalArgumentException,
                                java.lang.SecurityException,
                                java.lang.InstantiationException,
                                java.lang.IllegalAccessException,
                                java.lang.reflect.InvocationTargetException,
                                java.lang.NoSuchMethodException,
                                java.lang.ClassNotFoundException,
                                com.martiansoftware.jsap.JSAPException
        Throws:
        java.io.IOException
        java.lang.IllegalArgumentException
        java.lang.SecurityException
        java.lang.InstantiationException
        java.lang.IllegalAccessException
        java.lang.reflect.InvocationTargetException
        java.lang.NoSuchMethodException
        java.lang.ClassNotFoundException
        com.martiansoftware.jsap.JSAPException