Class BVGraph
- All Implemented Interfaces:
it.unimi.dsi.lang.FlyweightPrototype<ImmutableGraph>,CompressionFlags,Serializable
This class provides a flexible and configurable way to store and
access web graphs in a compressed form. Its main method can load an
ImmutableGraph and compress it. The resulting compressed BVGraph is described by a graph file (with extension
.graph), an offset file (with extension
.offsets) and a property file (with extension
.properties). The latter, not surprisingly, is a Java property file.
Optionally, an offset big-list file (with extension
.obl) can be created to load graphs faster.
As a rule of thumb, random access is faster using successors(int), whereas
while iterating using a NodeIterator it is better to use NodeIterator.successorArray().
Parallel compression
Starting with version 3.5.0, this classes uses ImmutableGraph.splitNodeIterators(int) to compress
a graph using parallel compression threads. The number of parallel threads can be set at construction time, or
using the property "it.unimi.dsi.webgraph.threads" from the command line; this approach
is useful with classes such as Transform.
Parallel compression requires the creation of possibly large temporary files. It might be necessary
to set the property java.io.tmpdir to a suitable directory if you experience disk-full errors during compression.
The Graph File
This class stores a graph as an bit stream. The bit stream format depends on a number of parameters and encodings that can be mixed orthogonally. The parameters are:
- the window size, a nonnegative integer;
- the maximum reference count, a positive integer (it is meaningful only when the window is nonzero);
- the minimum interval length, an integer larger than or equal to two, or 0, which is interpreted as infinity.
Successor Lists
The graph file is a sequence of successor lists, one for each node. The list of node x can be thought of as a sequence of natural numbers (even though, as we will explain later, this sequence is further coded suitably as a sequence of bits):
- The outdegree of the node; if it is zero, the list ends here.
- If the window size is not zero, the reference part, that is:
- a nonnegative integer, the reference, which never exceeds the window size; if the reference is r, the list of successors will be specified as a modified version of the list of successors of x−r; if r is 0, then the list of successors will be specified explicitly;
- if r is nonzero:
- a natural number b, the block count;
- a sequence of b natural numbers B1, …, Bb, called the copy-block list; only the first number can be zero.
- Then comes the extra part, specifying some more entries that the list of successors contains (or all of them, if
r is zero), that is:
- If the minimum interval length is finite,
- an integer i, the interval count;
- a sequence of i pairs, whose first component is the left extreme of an interval, and whose second component is the length of the interval (i.e., the number of integers contained in the interval).
- Finally, the list of residuals, which contain all successors not specified by previous methods.
- If the minimum interval length is finite,
The above data should be interpreted as follows:
- The reference part, if present (i.e., if both the window size and the reference are strictly positive), specifies that part of the list of successors of node x−r should be copied; the successors of node x−r that should be copied are described in the copy-block list; more precisely, one should copy the first B1 entries of this list, discard the next B2, copy the next B3 etc. (the last remaining elements of the list of successors will be copied if b is even, and discarded if b is odd).
- The extra part specifies additional successors (or all of them, if the reference part is absent); the extra part is not present
if the number of successors that are to be copied according to the reference part already coincides with the outdegree of x;
the successors listed in the extra part are given in two forms:
- some of them are specified as belonging to (integer) intervals, if the minimum interval length is finite; the interval count indicates how many intervals, and the intervals themselves are listed as pairs (left extreme, length);
- the residuals are the remaining "scattered" successors.
How Successor Lists Are Coded
As we said before, the list of integers corresponding to each successor list should be coded into a sequence of bits.
This is (ideally) done in two phases: we first modify the sequence in a suitable manner (as explained below) so to obtain
another sequence of integers (some of them might be negative). Then each single integer is coded, using a coding that can
be specified as an option; the integers that may be negative are first turned into natural numbers using Fast.int2nat(int).
- The outdegree of the node is left unchanged, as well as the reference and the block count;
- all blocks are decremented by 1, except for the first one;
- the interval count is left unchanged;
- all interval lengths are decremented by the minimum interval length;
- the first left extreme is expressed as its difference from x (it will be negative if the first extreme is less than x); the remaining left extremes are expressed as their distance from the previous right extreme plus 2 (e.g., if the interval is [5..11] and the previous one was [1..3], then the left extreme 5 is expressed as 5-(3+2)=5-5=0);
- the first residual is expressed as its difference from x (it will be negative if the first residual is less than x); the remaining residuals are expressed as decremented differences from the previous residual.
The Offset File
Since the graph is stored as a bit stream, we must have some way to know where each successor list starts. This information is stored in the offset file, which contains the bit offset of each successor list (in particular, the offset of the first successor list will be zero). As a commodity, the offset file contains an additional offset pointing just after the last successor list (providing, as a side-effect, the actual bit length of the graph file). Each offset (except for the first one) is stored as a suitably coded difference from the previous offset.
The list of offsets can be additionally stored as a serialised EliasFanoMonotoneLongBigList
using a suitable command-line option. If the serialised big list is detected, it is loaded instead of parsing the offset list.
The Property File
This file contains self-explaining entries that are necessary to correctly decode the graph and offset files, and moreover give some statistical information about the compressed graph (e.g., the number of bits per link).
nodes- the number of nodes of the graph.
nodes- the number of arcs of the graph.
version- a version number.
graphclass- the name of the class that should load this graph (
ImmutableGraphconvention). bitsperlink- the number of bits per link (overall graph size in bits divided by the number of arcs).
bitspernode- the number of bits per node (overall graph size in bits divided by the number of nodes).
compratio- the ratio between the graph size and the information-theoretical lower bound (the binary logarithm of the number of subsets of size
arcsout of a universe ofnodes2 elements). compressionflags- flags specifying the codes used for the components of the compression algorithm.
zetak- if ζ codes are selected for residuals, the parameter k.
windowsize- the window size.
maxref- the maximum reference count.
minintervallength- the minimum length of an interval.
avgdist- the average distance of a reference.
avgref- the average length of reference chains.
bitsfor*- number of bits used by a specific compoenent of the algorithm (the sum is the number of bits used to store the graph).
avgbitsfor*- number of bits used by a specific compoenent of the algorithm, divided by the number of nodes (the sum is the number of bits per node).
*arcs- the number of arcs stored by each component of the algorithm (the sum is the number of arcs).
*expstats- frequencies of the floor of the logarithm of successor gaps and residual gaps, separated by a comma; the statistics include the gap between each node
and its first successor, after it has been passed through
Fast.int2nat(int), but discarding zeroes (which happen in very rare circumstance, and should be considered immaterial). *avg[log]gap- the average of the gaps (or of their logarithm) of successors and residuals: note that this data is computed from the exponential statistics above, and
thus it is necessarily approximate.
How The Graph File Is Loaded Into Memory
The natural way of using a graph file is to load it into a byte array and
then index its bits using the suitable offset. This class will use a byte
array for graphs smaller than Integer.MAX_VALUE bytes,
and a FastMultiByteArrayInputStream
otherwise: in the latter case, expect a significant slowdown (as
an InputBitStream can wrap directly
a byte array).
Offsets are loaded using an EliasFanoMonotoneLongBigList,
which occupies exponentially less space than the graph itself (unless
your graph is pathologically sparse). There is of course a cost involved in
accessing the list with respect to accessing an array of longs.
Note that by default the EliasFanoMonotoneLongBigList instance is
created from scratch using the file of offsets. This is a long and tedious
process, in particular with large graphs. The main method of this class
has an option that will generate such a list once for all and serialise it in a file with
extension .obl. The list will be quickly deserialised
if its modification date is later than that of the offset file.
Not Loading the Graph File at All
For some applications (such as transposing a graph) it is not necessary to load the graph
file in memory. Since this class is able to enumerate the links of a graph without using random
access, it is possible not to load in memory any information at all, and obtain iterators that
directly read from the graph file. To obtain this effect, you must call loadOffline(CharSequence).
Memory–Mapping a Graph
Another interesting alternative is memory mapping. When using loadMapped(CharSequence),
the graph will be mapped into memory, and the offsets loaded. The graph will provide random access and behave
as if it was loaded into memory, but of course the access will be slower.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class it.unimi.dsi.webgraph.ImmutableGraph
ImmutableGraph.LoadMethod -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected CharSequenceThe basename of the graph.static final intFlag: write block counts using δ coding.static final intFlag: write block counts using γ coding (default).static final intFlag: write block counts using unary coding.protected intThe coding for copy-block lists.protected intThe coding for block counts.static final intFlag: write copy-block lists using δ coding.static final intFlag: write copy-block lists using γ coding (default).static final intThis number classifies the present graph format.protected intIf not -1, the node whose degree is cached incachedOutdegree.protected intIfcachedNodeis notInteger.MIN_VALUE, its cached outdegree.protected longIfcachedNodeis notInteger.MIN_VALUE, the position immediately after the coding of the outdegree ofcachedNode.static final intDefault backward reference maximum length.static final intDefault minimum interval length.static final intDefault window size.static final intDefault value of k.static final StringThe standard extension for the graph bit stream.protected byte[]The byte array storing the compressed graph, ifisMemoryis true andoffsetTypeis not -1.protected it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStreamThe multi-byte array input stream storing the compressed graph, ifisMemoryis false,isMappedis false andoffsetTypeis not -1.protected static final intThe initial length of an array that will contain a successor list.protected booleanWhenoffsetTypeis not -1, whether this graph is directly loaded intographMemory, or rather memory-mapped.protected booleanWhenoffsetTypeis not -1, whether this graph is directly loaded intographMemory, or rather wrapped in aFastMultiByteArrayInputStreamspecified bygraphStream.protected longThe number of arcs of the graph.protected it.unimi.dsi.io.ByteBufferInputStreamThe memory-mapped input stream storing the compressed graph, ifisMappedis true.protected intThe maximum reference count.protected intThe minimum interval length.protected intThe number of nodes of the graph.static final intA special value forminIntervalLengthinterpreted as meaning that the minimum interval length is infinity.static final intThe offset step parameter corresponding to offline load.protected intThe coding for offsets.protected it.unimi.dsi.fastutil.longs.LongBigListThis variable isnulliffoffsetTypeis zero or less (implying that offsets have not been loaded).static final StringThe standard extension for the cachedLongBigListcontaining the graph offsets.static final intFlag: write offsets using δ coding.static final StringThe standard extension for the graph-offsets bit stream.static final intFlag: write offsets using γ coding (default).protected intThe offset type: 2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file.protected intThe coding for outdegrees.static final intFlag: write outdegrees using δ coding.static final StringThe standard extension for the stream of node outdegrees.static final intFlag: write outdegrees using γ coding (default).protected intThe coding for references.static final intFlag: write references using δ coding.static final intFlag: write references using γ coding.static final intFlag: write references using unary coding (default).protected intThe coding for residuals.static final intFlag: write residuals using δ coding.static final intFlag: write residuals using γ coding.static final intFlag: write residuals using Golomb coding.static final intFlag: write residuals using variable-length nibble coding.static final intFlag: write residuals using ζk coding (default).static final intThe offset step parameter corresponding to sequential load.protected intThe window size.protected intThe value of k for ζk coding (for residuals).Fields inherited from class it.unimi.dsi.webgraph.ImmutableGraph
GRAPHCLASS_PROPERTY_KEY, NUMBER_OF_THREADS_PROPERTY, PROPERTIES_EXTENSIONFields inherited from interface it.unimi.dsi.webgraph.CompressionFlags
CODING_NAME, DELTA, GAMMA, GOLOMB, NIBBLE, SKEWED_GOLOMB, UNARY, ZETA -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbasename()Returns a symbolic basename for this graph (optional operation).copy()Returns a flyweight copy of this immutable graph.booleanWhether the node iterators returned by this graph supportNodeIterator.copy(int).protected static intintervalize(it.unimi.dsi.fastutil.ints.IntArrayList x, int minInterval, it.unimi.dsi.fastutil.ints.IntArrayList left, it.unimi.dsi.fastutil.ints.IntArrayList len, it.unimi.dsi.fastutil.ints.IntArrayList residuals) This method tries to express an increasing sequence of natural numbersxas a union of an increasing sequence of intervals and an increasing sequence of residual elements.static BVGraphload(CharSequence basename) Creates a newBVGraphby loading a compressed graph file from disk to memory, with no progress logger and all offsets.static BVGraphload(CharSequence basename, int offsetType) Creates a newBVGraphby loading a compressed graph file from disk to memory, with no progress logger.static BVGraphload(CharSequence basename, int offsetType, it.unimi.dsi.logging.ProgressLogger pl) Creates a newBVGraphby loading a compressed graph file from disk to memory.static BVGraphload(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) Creates a newBVGraphby loading a compressed graph file from disk to memory, with all offsets.protected BVGraphloadInternal(CharSequence basename, int offsetType, it.unimi.dsi.logging.ProgressLogger pl) Loads a compressed graph file from disk into this graph.static BVGraphloadMapped(CharSequence basename) Creates a newBVGraphby memory-mapping a graph file.static BVGraphloadMapped(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) Creates a newBVGraphby memory-mapping a graph file.static BVGraphloadOffline(CharSequence basename) Creates a newBVGraphby loading just the metadata of a compressed graph file.static BVGraphloadOffline(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) Creates a newBVGraphby loading just the metadata of a compressed graph file.static BVGraphloadSequential(CharSequence basename) Deprecated.static BVGraphloadSequential(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) Deprecated.static voidReads an immutable graph and stores it as aBVGraph.intReturns the maximum reference count of this graph.nodeIterator(int from) This method returns a node iterator for scanning the graph sequentially, starting from the given node.longnumArcs()Returns the number of arcs of this graph (optional operation).intnumNodes()Returns the number of nodes of this graph.intoutdegree(int x) Returns the outdegree of a node.booleanChecks whether this graph provides random access to successor lists.protected final intreadBlock(it.unimi.dsi.io.InputBitStream ibs) Reads a block from the given stream.protected final intreadBlockCount(it.unimi.dsi.io.InputBitStream ibs) Reads a block count from the given stream.protected final longreadLongResidual(it.unimi.dsi.io.InputBitStream ibs) Reads a long residual from the given stream.protected final longreadOffset(it.unimi.dsi.io.InputBitStream ibs) Reads an offset difference from the given stream.protected final intreadOutdegree(it.unimi.dsi.io.InputBitStream ibs) Reads an outdegree from the given stream.protected final intreadOutdegree(it.unimi.dsi.io.InputBitStream ibs, long offset) Reads an outdegree from the given stream at a given offset.protected final intreadReference(it.unimi.dsi.io.InputBitStream ibs) Reads a reference from the given stream.protected final intreadResidual(it.unimi.dsi.io.InputBitStream ibs) Reads a residual from the given stream.static voidstore(ImmutableGraph graph, CharSequence basename) Writes the given graph using a given base name, without any progress logger and with all parameters set to their default values.static voidstore(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags) Writes the given graph using a given base name, without any progress logger.static voidstore(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, int numberOfThreads) Writes the given graph using a given base name, without any progress logger.static voidstore(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, int numberOfThreads, it.unimi.dsi.logging.ProgressLogger pl) Writes the given graph using a given base name.static voidstore(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, it.unimi.dsi.logging.ProgressLogger pl) Writes the given graph using a given base name.static voidstore(ImmutableGraph graph, CharSequence basename, int numberOfThreads, it.unimi.dsi.logging.ProgressLogger pl) Writes the given graph using a given base name, with all parameters set to their default values.static voidstore(ImmutableGraph graph, CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) Writes the given graph using a given base name, with all parameters set to their default values.successors(int x) Returns an iterator over the successors of a given node.protected LazyIntIteratorsuccessors(int x, it.unimi.dsi.io.InputBitStream ibs, int[][] window, int[] outd) Given anInputBitStreamwrapping a graph file, returns an iterator over the successors of a given nodex.protected static voidupdateBins(int currNode, int[] list, int length, long[] bin) Updates a list of exponential bins using the gaps a given list of strinctly increasing integers.intReturns the window size of this graph.protected final intwriteBlock(it.unimi.dsi.io.OutputBitStream obs, int block) Writes a block to the given stream.protected final intwriteBlockCount(it.unimi.dsi.io.OutputBitStream obs, int count) Writes a block count to the given stream.protected final intwriteOffset(it.unimi.dsi.io.OutputBitStream obs, long x) Writes an offset difference to the given stream.voidwriteOffsets(it.unimi.dsi.io.OutputBitStream obs, it.unimi.dsi.logging.ProgressLogger pl) Write the offset file to a given bit stream.protected final intwriteOutdegree(it.unimi.dsi.io.OutputBitStream obs, int d) Writes an outdegree to the given stream.protected final intwriteReference(it.unimi.dsi.io.OutputBitStream obs, int ref) Writes a reference to the given stream.protected final intwriteResidual(it.unimi.dsi.io.OutputBitStream obs, int residual) Writes a residual to the given stream.protected final intwriteResidual(it.unimi.dsi.io.OutputBitStream obs, long residual) Writes a residual to the given stream.Methods inherited from class it.unimi.dsi.webgraph.ImmutableGraph
equals, hashCode, load, loadOnce, nodeIterator, outdegrees, splitNodeIterators, store, store, successorArray, toString
-
Field Details
-
SEQUENTIAL
public static final int SEQUENTIALThe offset step parameter corresponding to sequential load.- See Also:
-
OFFLINE
public static final int OFFLINEThe offset step parameter corresponding to offline load.- See Also:
-
GRAPH_EXTENSION
The standard extension for the graph bit stream.- See Also:
-
OFFSETS_EXTENSION
The standard extension for the graph-offsets bit stream.- See Also:
-
OFFSETS_BIG_LIST_EXTENSION
The standard extension for the cachedLongBigListcontaining the graph offsets.- See Also:
-
OUTDEGREES_EXTENSION
The standard extension for the stream of node outdegrees.- See Also:
-
BVGRAPH_VERSION
public static final int BVGRAPH_VERSIONThis number classifies the present graph format. When new features require introducing binary incompatibilities, this number is bumped so to ensure that old classes do not try to read graphs they cannot understand.- See Also:
-
INITIAL_SUCCESSOR_LIST_LENGTH
protected static final int INITIAL_SUCCESSOR_LIST_LENGTHThe initial length of an array that will contain a successor list.- See Also:
-
NO_INTERVALS
public static final int NO_INTERVALSA special value forminIntervalLengthinterpreted as meaning that the minimum interval length is infinity.- See Also:
-
basename
The basename of the graph. This may benull, but trying to load the graph with an offset step of -1 will cause an exception. -
n
protected int nThe number of nodes of the graph. -
m
protected long mThe number of arcs of the graph. -
isMemory
protected boolean isMemoryWhenoffsetTypeis not -1, whether this graph is directly loaded intographMemory, or rather wrapped in aFastMultiByteArrayInputStreamspecified bygraphStream. -
isMapped
protected boolean isMappedWhenoffsetTypeis not -1, whether this graph is directly loaded intographMemory, or rather memory-mapped. -
graphMemory
protected byte[] graphMemoryThe byte array storing the compressed graph, ifisMemoryis true andoffsetTypeis not -1.This variable is loaded with a copy of the graph file, or with a rearrangement of the latter, depending on whether
offsetTypeis smaller than or equal to one. IfoffsetTypeis -1, this variable isnull, and node iterators are generated by opening streams directly on the graph file. -
graphStream
protected it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream graphStreamThe multi-byte array input stream storing the compressed graph, ifisMemoryis false,isMappedis false andoffsetTypeis not -1.It is loaded with a copy of the graph file. If
offsetTypeis -1, this variable isnull, and node iterators are generated by opening streams directly on the graph file. -
mappedGraphStream
protected it.unimi.dsi.io.ByteBufferInputStream mappedGraphStreamThe memory-mapped input stream storing the compressed graph, ifisMappedis true.It is loaded with a copy of the graph file. If
offsetTypeis -1, this variable isnull, and node iterators are generated by opening streams directly on the graph file. -
offsets
protected it.unimi.dsi.fastutil.longs.LongBigList offsetsThis variable isnulliffoffsetTypeis zero or less (implying that offsets have not been loaded). Otherwise, it is an Elias–Fano monotone list containing the pointers of the bit streams of one eachoffsetTypenodes. -
offsetType
protected int offsetTypeThe offset type: 2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file. -
cachedNode
protected transient int cachedNodeIf not -1, the node whose degree is cached incachedOutdegree. -
cachedOutdegree
protected int cachedOutdegreeIfcachedNodeis notInteger.MIN_VALUE, its cached outdegree. -
cachedPointer
protected long cachedPointerIfcachedNodeis notInteger.MIN_VALUE, the position immediately after the coding of the outdegree ofcachedNode. -
maxRefCount
protected int maxRefCountThe maximum reference count. -
DEFAULT_MAX_REF_COUNT
public static final int DEFAULT_MAX_REF_COUNTDefault backward reference maximum length.- See Also:
-
windowSize
protected int windowSizeThe window size. Zero means no references. -
DEFAULT_WINDOW_SIZE
public static final int DEFAULT_WINDOW_SIZEDefault window size.- See Also:
-
minIntervalLength
protected int minIntervalLengthThe minimum interval length. -
DEFAULT_MIN_INTERVAL_LENGTH
public static final int DEFAULT_MIN_INTERVAL_LENGTHDefault minimum interval length.- See Also:
-
zetaK
protected int zetaKThe value of k for ζk coding (for residuals). -
DEFAULT_ZETA_K
public static final int DEFAULT_ZETA_KDefault value of k.- See Also:
-
OUTDEGREES_GAMMA
public static final int OUTDEGREES_GAMMAFlag: write outdegrees using γ coding (default).- See Also:
-
OUTDEGREES_DELTA
public static final int OUTDEGREES_DELTAFlag: write outdegrees using δ coding.- See Also:
-
BLOCKS_GAMMA
public static final int BLOCKS_GAMMAFlag: write copy-block lists using γ coding (default).- See Also:
-
BLOCKS_DELTA
public static final int BLOCKS_DELTAFlag: write copy-block lists using δ coding.- See Also:
-
RESIDUALS_GAMMA
public static final int RESIDUALS_GAMMAFlag: write residuals using γ coding.- See Also:
-
RESIDUALS_ZETA
public static final int RESIDUALS_ZETAFlag: write residuals using ζk coding (default).- See Also:
-
RESIDUALS_DELTA
public static final int RESIDUALS_DELTAFlag: write residuals using δ coding.- See Also:
-
RESIDUALS_NIBBLE
public static final int RESIDUALS_NIBBLEFlag: write residuals using variable-length nibble coding.- See Also:
-
RESIDUALS_GOLOMB
public static final int RESIDUALS_GOLOMBFlag: write residuals using Golomb coding.- See Also:
-
REFERENCES_GAMMA
public static final int REFERENCES_GAMMAFlag: write references using γ coding.- See Also:
-
REFERENCES_DELTA
public static final int REFERENCES_DELTAFlag: write references using δ coding.- See Also:
-
REFERENCES_UNARY
public static final int REFERENCES_UNARYFlag: write references using unary coding (default).- See Also:
-
BLOCK_COUNT_GAMMA
public static final int BLOCK_COUNT_GAMMAFlag: write block counts using γ coding (default).- See Also:
-
BLOCK_COUNT_DELTA
public static final int BLOCK_COUNT_DELTAFlag: write block counts using δ coding.- See Also:
-
BLOCK_COUNT_UNARY
public static final int BLOCK_COUNT_UNARYFlag: write block counts using unary coding.- See Also:
-
OFFSETS_GAMMA
public static final int OFFSETS_GAMMAFlag: write offsets using γ coding (default).- See Also:
-
OFFSETS_DELTA
public static final int OFFSETS_DELTAFlag: write offsets using δ coding.- See Also:
-
outdegreeCoding
protected int outdegreeCodingThe coding for outdegrees. By default, we use γ coding. -
blockCoding
protected int blockCodingThe coding for copy-block lists. By default, we use γ coding. -
residualCoding
protected int residualCodingThe coding for residuals. By default, we use ζ coding. -
referenceCoding
protected int referenceCodingThe coding for references. By default, we use unary coding. -
blockCountCoding
protected int blockCountCodingThe coding for block counts. By default, we use γ coding. -
offsetCoding
protected int offsetCodingThe coding for offsets. By default, we use γ coding.
-
-
Constructor Details
-
BVGraph
protected BVGraph()
-
-
Method Details
-
copy
Description copied from class:ImmutableGraphReturns a flyweight copy of this immutable graph.- Specified by:
copyin interfaceit.unimi.dsi.lang.FlyweightPrototype<ImmutableGraph>- Specified by:
copyin classImmutableGraph- Returns:
- a flyweight copy of this immutable graph.
- See Also:
-
numNodes
public int numNodes()Description copied from class:ImmutableGraphReturns the number of nodes of this graph.Albeit this method is not optional, it is allowed that this method throws an
UnsupportedOperationExceptionif this graph has never been entirely traversed using anode iterator. This apparently bizarre behaviour is necessary to support implementations asArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.- Specified by:
numNodesin classImmutableGraph- Returns:
- the number of nodes.
-
numArcs
public long numArcs()Description copied from class:ImmutableGraphReturns the number of arcs of this graph (optional operation).- Overrides:
numArcsin classImmutableGraph- Returns:
- the number of arcs.
-
randomAccess
public boolean randomAccess()Description copied from class:ImmutableGraphChecks whether this graph provides random access to successor lists.- Specified by:
randomAccessin classImmutableGraph- Returns:
- true if this graph provides random access to successor lists.
-
hasCopiableIterators
public boolean hasCopiableIterators()Description copied from class:ImmutableGraphWhether the node iterators returned by this graph supportNodeIterator.copy(int).- Overrides:
hasCopiableIteratorsin classImmutableGraph- Returns:
- true if this graph provides copiable iterators.
-
basename
Description copied from class:ImmutableGraphReturns a symbolic basename for this graph (optional operation).- Overrides:
basenamein classImmutableGraph- Returns:
- the basename.
-
maxRefCount
public int maxRefCount()Returns the maximum reference count of this graph.- Returns:
- the maximum reference count.
-
windowSize
public int windowSize()Returns the window size of this graph.- Returns:
- the window size.
-
readOffset
Reads an offset difference from the given stream.- Parameters:
ibs- an offset-file input bit stream.- Returns:
- the next offset difference.
- Throws:
IOException
-
writeOffset
Writes an offset difference to the given stream.- Parameters:
obs- an offset-file output bit stream.x- an offset difference to be stored in the stream.- Returns:
- the number of bits written.
- Throws:
IOException
-
readOutdegree
Reads an outdegree from the given stream.- Parameters:
ibs- a graph-file input bit stream.- Returns:
- the next outdegree.
- Throws:
IOException
-
readOutdegree
protected final int readOutdegree(it.unimi.dsi.io.InputBitStream ibs, long offset) throws IOException Reads an outdegree from the given stream at a given offset.- Parameters:
ibs- a graph-file input bit stream.offset- the offset at which the stream must be positioned.- Returns:
- the next outdegree.
- Throws:
IOException
-
writeOutdegree
Writes an outdegree to the given stream.- Parameters:
obs- a graph-file output bit stream.d- an outdegree to be stored in the stream.- Returns:
- the number of bits written.
- Throws:
IOException
-
readReference
Reads a reference from the given stream.- Parameters:
ibs- a graph-file input bit stream.- Returns:
- the next reference.
- Throws:
IOException
-
writeReference
Writes a reference to the given stream.- Parameters:
obs- a graph-file output bit stream.ref- the reference.- Returns:
- the number of bits written.
- Throws:
IOException
-
readBlockCount
Reads a block count from the given stream.- Parameters:
ibs- a graph-file input bit stream.- Returns:
- the next block count.
- Throws:
IOException
-
writeBlockCount
protected final int writeBlockCount(it.unimi.dsi.io.OutputBitStream obs, int count) throws IOException Writes a block count to the given stream.- Parameters:
obs- a graph-file output bit stream.count- the block count.- Returns:
- the number of written bits.
- Throws:
IOException
-
readBlock
Reads a block from the given stream.- Parameters:
ibs- a graph-file input bit stream.- Returns:
- the next block.
- Throws:
IOException
-
writeBlock
Writes a block to the given stream.- Parameters:
obs- a graph-file output bit stream.block- the block.- Returns:
- the number of written bits.
- Throws:
IOException
-
readResidual
Reads a residual from the given stream.- Parameters:
ibs- a graph-file input bit stream.- Returns:
- the next residual.
- Throws:
IOException
-
readLongResidual
Reads a long residual from the given stream.- Parameters:
ibs- a graph-file input bit stream.- Returns:
- the next residual.
- Throws:
IOException
-
writeResidual
protected final int writeResidual(it.unimi.dsi.io.OutputBitStream obs, int residual) throws IOException Writes a residual to the given stream.- Parameters:
obs- a graph-file output bit stream.residual- the residual.- Returns:
- the number of written bits.
- Throws:
IOException
-
writeResidual
protected final int writeResidual(it.unimi.dsi.io.OutputBitStream obs, long residual) throws IOException Writes a residual to the given stream.- Parameters:
obs- a graph-file output bit stream.residual- the residual.- Returns:
- the number of written bits.
- Throws:
IOException
-
outdegree
Description copied from class:ImmutableGraphReturns the outdegree of a node.- Specified by:
outdegreein classImmutableGraph- Parameters:
x- a node.- Returns:
- the outdegree of the given node.
- Throws:
IllegalStateException- if called without offsets.
-
successors
Returns an iterator over the successors of a given node.- Overrides:
successorsin classImmutableGraph- Parameters:
x- a node.- Returns:
- an iterator over the successors of the node.
-
successors
protected LazyIntIterator successors(int x, it.unimi.dsi.io.InputBitStream ibs, int[][] window, int[] outd) throws IllegalStateException Given anInputBitStreamwrapping a graph file, returns an iterator over the successors of a given nodex.This method can be used in two different ways:
- by providing a node and an input bit stream wrapping a graph file, it is possible to access the successor list of the node (provided that offsets have been loaded);
- by providing additional data, which essentially are used to keep some state about the graph, it is possible to perform an efficient sequential visit of all successor lists (even when no offsets were loaded).
This method may modify the offset and the outdegree caches if
windowisnull.- Parameters:
x- a node.ibs- an input bit stream wrapping a graph file. After this method returns, the state ofibsis undefined: however, after the iterator returned is exhausted,ibswill positioned just after the successor list ofx.window- eithernull, or a double array with the following meaning:window[(x-i) mod windowSize]contains, for allibetween 1 (inclusive) andwindowSize(exclusive), the list of successors of nodex−i. Ifwindowis notnullthenibsmust be positioned before the successor list ofx. This parameter will not be modified.outd- ifwindowis notnull, this is an array with as many elements aswindowSize;outd[(x-i) mod windowSize]contains the outdegree of nodex−iforigreater than 0; at the end, this will be true also foriequal to 0.- Returns:
- an iterator over the successors of
x. - Throws:
IllegalStateException- ifwindowisnullandoffsetTypeis 0.
-
nodeIterator
This method returns a node iterator for scanning the graph sequentially, starting from the given node. It keeps track of a sliding window ofwindowSize()previous successor lists to speed up the iteration of graphs with significant referentiation.- Overrides:
nodeIteratorin classImmutableGraph- Parameters:
from- the node from which the iterator will iterate.- Returns:
- a
NodeIteratorfor accessing nodes and successors sequentially.
-
load
public static BVGraph load(CharSequence basename, int offsetType, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Creates a newBVGraphby loading a compressed graph file from disk to memory.- Parameters:
basename- the basename of the graph.offsetType- the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file).pl- a progress logger used while loading the graph, ornull.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the graph.
-
load
Creates a newBVGraphby loading a compressed graph file from disk to memory, with no progress logger.- Parameters:
basename- the basename of the graph.offsetType- the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file).- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the graph.
-
load
Creates a newBVGraphby loading a compressed graph file from disk to memory, with no progress logger and all offsets.- Parameters:
basename- the basename of the graph.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the graph.
-
load
public static BVGraph load(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Creates a newBVGraphby loading a compressed graph file from disk to memory, with all offsets.- Parameters:
basename- the basename of the graph.pl- a progress logger used while loading the graph, ornull.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the graph.
-
loadMapped
public static BVGraph loadMapped(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Creates a newBVGraphby memory-mapping a graph file.- Parameters:
basename- the basename of the graph.pl- a progress logger used while loading the offsets, ornull.- Returns:
- an
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while memory-mapping the graph or reading the offsets.
-
loadMapped
Creates a newBVGraphby memory-mapping a graph file.- Parameters:
basename- the basename of the graph.- Returns:
- an
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while memory-mapping the graph or reading the offsets.
-
loadSequential
@Deprecated public static BVGraph loadSequential(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Deprecated.Creates a newBVGraphby loading a compressed graph file from disk to memory, without offsets.- Parameters:
basename- the basename of the graph.pl- a progress logger used while loading the graph, ornull.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the graph.
-
loadSequential
Deprecated.UseloadOffline(CharSequence)orloadMapped(CharSequence)instead.Creates a newBVGraphby loading a compressed graph file from disk to memory, with no progress logger and without offsets.- Parameters:
basename- the basename of the graph.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the graph.
-
loadOffline
public static BVGraph loadOffline(CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Creates a newBVGraphby loading just the metadata of a compressed graph file.- Parameters:
basename- the basename of the graph.pl- a progress logger, ornull.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the metadata.
-
loadOffline
Creates a newBVGraphby loading just the metadata of a compressed graph file.- Parameters:
basename- the basename of the graph.- Returns:
- a
BVGraphcontaining the specified graph. - Throws:
IOException- if an I/O exception occurs while reading the metadata.
-
loadInternal
protected BVGraph loadInternal(CharSequence basename, int offsetType, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Loads a compressed graph file from disk into this graph. Note that this method should be called only on a newly created graph.- Parameters:
basename- the basename of the graph.offsetType- the desired offset type (2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file).pl- a progress logger used while loading the graph, ornull.- Returns:
- this graph.
- Throws:
IOException- if an I/O exception occurs while reading the graph.
-
intervalize
protected static int intervalize(it.unimi.dsi.fastutil.ints.IntArrayList x, int minInterval, it.unimi.dsi.fastutil.ints.IntArrayList left, it.unimi.dsi.fastutil.ints.IntArrayList len, it.unimi.dsi.fastutil.ints.IntArrayList residuals) This method tries to express an increasing sequence of natural numbersxas a union of an increasing sequence of intervals and an increasing sequence of residual elements. More precisely, this intervalization works as follows: first, one looks atxas a sequence of intervals (i.e., maximal sequences of consecutive elements); those intervals whose length is ≥minIntervalare stored in the listsleft(the list of left extremes) andlen(the list of lengths; the length of an integer interval is the number of integers in that interval). The remaining integers, called residuals are stored in theresiduallist.Note that the previous content of
left,lenandresidualis lost.- Parameters:
x- the list to be intervalized (an increasing list of natural numbers).minInterval- the least length that a maximal sequence of consecutive elements must have in order for it to be considered as an interval.left- the resulting list of left extremes of the intervals.len- the resulting list of interval lengths.residuals- the resulting list of residuals.- Returns:
- the number of intervals.
-
store
public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, int numberOfThreads, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Writes the given graph using a given base name.- Parameters:
graph- a graph to be compressed.basename- a base name.windowSize- the window size (-1 for the default value).maxRefCount- the maximum reference count (-1 for the default value).minIntervalLength- the minimum interval length (-1 for the default value,NO_INTERVALSto disable).zetaK- the parameter used for residual ζ-coding, if used (-1 for the default value).flags- the flag mask.numberOfThreads- the number of threads to use; if 0 or negative, it will be replaced byRuntime.availableProcessors(). Note that ifImmutableGraph.numNodes()is not implemented bygraph, the number of threads will be automatically set to one, possibly logging a warning.pl- a progress logger to log the state of compression, ornullif no logging is required.- Throws:
IOException- if some exception is raised while writing the graph.
-
store
public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Writes the given graph using a given base name.- Parameters:
graph- a graph to be compressed.basename- a base name.windowSize- the window size (-1 for the default value).maxRefCount- the maximum reference count (-1 for the default value).minIntervalLength- the minimum interval length (-1 for the default value,NO_INTERVALSto disable).zetaK- the parameter used for residual ζ-coding, if used (-1 for the default value).flags- the flag mask.pl- a progress logger to log the state of compression, ornullif no logging is required.- Throws:
IOException- if some exception is raised while writing the graph.
-
store
public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags) throws IOException Writes the given graph using a given base name, without any progress logger.- Parameters:
graph- a graph to be compressed.basename- a base name.windowSize- the window size (-1 for the default value).maxRefCount- the maximum reference count (-1 for the default value).minIntervalLength- the minimum interval length (-1 for the default value,NO_INTERVALSto disable).zetaK- the parameter used for residual ζ-coding, if used (-1 for the default value).flags- the flag mask.- Throws:
IOException- if some exception is raised while writing the graph.
-
store
public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, int numberOfThreads) throws IOException Writes the given graph using a given base name, without any progress logger.- Parameters:
graph- a graph to be compressed.basename- a base name.windowSize- the window size (-1 for the default value).maxRefCount- the maximum reference count (-1 for the default value).minIntervalLength- the minimum interval length (-1 for the default value,NO_INTERVALSto disable).zetaK- the parameter used for residual ζ-coding, if used (-1 for the default value).flags- the flag mask.numberOfThreads- the number of threads to use; if 0 or negative, it will be replaced byRuntime.availableProcessors(). Note that ifImmutableGraph.numNodes()is not implemented bygraph, the number of threads will be automatically set to one, possibly logging a warning.- Throws:
IOException- if some exception is raised while writing the graph.
-
store
public static void store(ImmutableGraph graph, CharSequence basename, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Writes the given graph using a given base name, with all parameters set to their default values.- Parameters:
graph- a graph to be compressed.basename- a base name.pl- a progress logger to log the state of compression, ornullif no logging is required.- Throws:
IOException- if some exception is raised while writing the graph.
-
store
public static void store(ImmutableGraph graph, CharSequence basename, int numberOfThreads, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Writes the given graph using a given base name, with all parameters set to their default values.- Parameters:
graph- a graph to be compressed.basename- a base name.numberOfThreads- the number of threads to use; if 0 or negative, it will be replaced byRuntime.availableProcessors(). Note that ifImmutableGraph.numNodes()is not implemented bygraph, the number of threads will be automatically set to one, possibly logging a warning.pl- a progress logger to log the state of compression, ornullif no logging is required.- Throws:
IOException- if some exception is raised while writing the graph.
-
store
Writes the given graph using a given base name, without any progress logger and with all parameters set to their default values.- Parameters:
graph- a graph to be compressed.basename- a base name.- Throws:
IOException- if some exception is raised while writing the graph.
-
updateBins
protected static void updateBins(int currNode, int[] list, int length, long[] bin) Updates a list of exponential bins using the gaps a given list of strinctly increasing integers.- Parameters:
currNode- the current node.list- a strictly increasing list of integers.length- the number of valid elements inlist.bin- the bins.
-
writeOffsets
public void writeOffsets(it.unimi.dsi.io.OutputBitStream obs, it.unimi.dsi.logging.ProgressLogger pl) throws IOException Write the offset file to a given bit stream.- Parameters:
obs- the output bit stream to which offsets will be written.pl- a progress logger, ornull.- Throws:
IOException
-
main
public static void main(String[] args) throws SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, com.martiansoftware.jsap.JSAPException, ClassNotFoundException, InstantiationException Reads an immutable graph and stores it as aBVGraph.- Throws:
SecurityExceptionIllegalAccessExceptionInvocationTargetExceptionNoSuchMethodExceptionIOExceptioncom.martiansoftware.jsap.JSAPExceptionClassNotFoundExceptionInstantiationException
-
loadOffline(CharSequence)orloadMapped(CharSequence)instead.