Class ScatteredArcsASCIIGraph

  • All Implemented Interfaces:
    it.unimi.dsi.lang.FlyweightPrototype<ImmutableGraph>

    public class ScatteredArcsASCIIGraph
    extends ImmutableSequentialGraph
    An ImmutableGraph that corresponds to a graph stored as a scattered list of arcs.

    A scattered list of arcs describes a graph in a fairly loose way. Each line contains an arc specified as two node identifiers separated by whitespace (but we suggest exactly one TAB character). Sources and targets can be in any order.

    In the standard description, node identifiers can be in the range [-263..263): they will be remapped in a compact identifier space by assigning to each newly seen identifier a new node number. The list of identifiers in order of appearance is available in ids. Lines can be empty, or comments starting with #. Characters following the target will be discarded with a warning.

    Warning: Lines not conforming the above specification will cause an error to be logged, but will be otherwise ignored.

    Alternatively, you can provide an Object2LongFunction<String> with default return value -1 that will be used to map identifiers to node numbers, along with a Charset to parse lines and the number of nodes of the graph (which must be a strict upper bound for the largest value returned by the function).

    A faster conversion can be obtained by providing an Object2LongFunction<byte[]>: in this case, no character decoding is performed, and keys are bytes sequences. This technique goes hand in hand with classes such as GOV3Function, which can store efficiently a map from byte arrays to longs.

    Additionally, the resulting graph can be symmetrized, and its loops be removed, using suitable constructor options.

    This class has no load method, and its main method converts a scattered-arcs representation directly into a BVGraph.

    Using ScatteredArcsASCIIGraph to convert your data

    A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by scattered arcs. Suppose you create the following file, named example.arcs:

      # My graph
      -1 15
      15 2
      2 -1 This will cause a warning to be logged
      OOPS! (This will cause an error to be logged)
      -1 2
     
    Then, the command
      java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example <example.arcs
     
    will produce a compressed graph in BVGraph format with basename example. The file example.ids will contain the list of longs -1, 15, 2. The node with identifer -1 will be the node 0 in the output graph, the node with identifier 15 will be node 1, and the node with identifier 2 will be node 2. The graph example will thus have three nodes and four arcs (viz., <0,1>, <0,2>, <1,2> and <2,0>).

    Memory requirements

    To convert node identifiers to node numbers, instances of this class use a custom map that in the worst case will require 25.5×2⌈log(4n/3)⌉ ≤ 68n bytes, where n is the number of distinct identifiers. Storing batches of arcs in memory requires 16 bytes per arc.

    • Constructor Summary

      Constructors 
      Constructor Description
      ScatteredArcsASCIIGraph​(java.io.InputStream is)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops, int batchSize)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir, it.unimi.dsi.logging.ProgressLogger pl)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function, java.nio.charset.Charset charset, long n, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir, it.unimi.dsi.logging.ProgressLogger pl)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function, long n)
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function, long n, boolean symmetrize)
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops)
      Creates a scattered-arcs ASCII graph.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize)
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir)
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
      ScatteredArcsASCIIGraph​(java.io.InputStream is, it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function, long n, boolean symmetrize, boolean noLoops, int batchSize, java.io.File tempDir, it.unimi.dsi.logging.ProgressLogger pl)
      Creates a scattered-arcs ASCII graph using a function on byte arrays.
    • Field Detail

      • DEFAULT_BATCH_SIZE

        public static final int DEFAULT_BATCH_SIZE
        The default batch size.
        See Also:
        Constant Field Values
      • ids

        public long[][] ids
        The big-array list of identifiers in order of appearance.
    • Constructor Detail

      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a standard scattered list of arcs.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       boolean symmetrize)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a standard scattered list of arcs.
        symmetrize - the new graph will be forced to be symmetric.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       boolean symmetrize,
                                       boolean noLoops)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a standard scattered list of arcs.
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a standard scattered list of arcs.
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize,
                                       java.io.File tempDir)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a standard scattered list of arcs.
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize,
                                       java.io.File tempDir,
                                       it.unimi.dsi.logging.ProgressLogger pl)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a standard scattered list of arcs.
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger, or null.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function,
                                       java.nio.charset.Charset charset,
                                       long n)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
        charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
        n - the number of nodes of the graph (used only if function is not null).
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function,
                                       java.nio.charset.Charset charset,
                                       long n,
                                       boolean symmetrize)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
        charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function,
                                       java.nio.charset.Charset charset,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
        charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function,
                                       java.nio.charset.Charset charset,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
        charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function,
                                       java.nio.charset.Charset charset,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize,
                                       java.io.File tempDir)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
        charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<? extends java.lang.CharSequence> function,
                                       java.nio.charset.Charset charset,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize,
                                       java.io.File tempDir,
                                       it.unimi.dsi.logging.ProgressLogger pl)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from string representing nodes to node numbers, or null for the standard behaviour.
        charset - a character set that will be used to read the identifiers passed to function, or null for ISO-8859-1 (used only if function is not null).
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger, or null.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function,
                                       long n)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph using a function on byte arrays.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from byte arrays representing nodes to node numbers.
        n - the number of nodes of the graph (used only if function is not null).
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function,
                                       long n,
                                       boolean symmetrize)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph using a function on byte arrays.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from byte arrays representing nodes to node numbers.
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from byte arrays representing nodes to node numbers.
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph using a function on byte arrays.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from byte arrays representing nodes to node numbers.
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize,
                                       java.io.File tempDir)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph using a function on byte arrays.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from byte arrays representing nodes to node numbers.
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        Throws:
        java.io.IOException
      • ScatteredArcsASCIIGraph

        public ScatteredArcsASCIIGraph​(java.io.InputStream is,
                                       it.unimi.dsi.fastutil.objects.Object2LongFunction<byte[]> function,
                                       long n,
                                       boolean symmetrize,
                                       boolean noLoops,
                                       int batchSize,
                                       java.io.File tempDir,
                                       it.unimi.dsi.logging.ProgressLogger pl)
                                throws java.io.IOException
        Creates a scattered-arcs ASCII graph using a function on byte arrays.
        Parameters:
        is - an input stream containing a scattered list of arcs.
        function - an explicitly provided function from byte arrays representing nodes to node numbers.
        n - the number of nodes of the graph (used only if function is not null).
        symmetrize - the new graph will be forced to be symmetric.
        noLoops - the new graph will have no loops.
        batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
        tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
        pl - a progress logger, or null.
        Throws:
        java.io.IOException
    • Method Detail

      • numNodes

        public long numNodes()
        Description copied from class: ImmutableGraph
        Returns the number of nodes of this graph.

        Albeit this method is not optional, it is allowed that this method throws an UnsupportedOperationException if this graph has never been entirely traversed using a node iterator. This apparently bizarre behaviour is necessary to support implementations as ArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.

        Specified by:
        numNodes in class ImmutableGraph
        Returns:
        the number of nodes.
      • numArcs

        public long numArcs()
        Description copied from class: ImmutableGraph
        Returns the number of arcs of this graph (optional operation).
        Overrides:
        numArcs in class ImmutableGraph
        Returns:
        the number of arcs.
      • nodeIterator

        public NodeIterator nodeIterator​(long from)
        Description copied from class: ImmutableGraph
        Returns a node iterator for scanning the graph sequentially, starting from the given node.
        Overrides:
        nodeIterator in class ImmutableSequentialGraph
        Parameters:
        from - the node from which the iterator will iterate.
        Returns:
        a NodeIterator for accessing nodes and successors sequentially.
      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.IllegalArgumentException,
                                java.lang.SecurityException,
                                java.io.IOException,
                                com.martiansoftware.jsap.JSAPException,
                                java.lang.ClassNotFoundException
        Throws:
        java.lang.IllegalArgumentException
        java.lang.SecurityException
        java.io.IOException
        com.martiansoftware.jsap.JSAPException
        java.lang.ClassNotFoundException