Class HITS<V,E>

Type Parameters:
V - the vertex type
E - the edge type
All Implemented Interfaces:
VertexScorer<V,HITS.Scores>, IterativeContext

public class HITS<V,E> extends HITSWithPriors<V,E>
Assigns hub and authority scores to each vertex depending on the topology of the network. The essential idea is that a vertex is a hub to the extent that it links to authoritative vertices, and is an authority to the extent that it links to 'hub' vertices.

The classic HITS algorithm essentially proceeds as follows:

 assign equal initial hub and authority values to each vertex
 repeat
   for each vertex w:
     w.hub = sum over successors x of x.authority
     w.authority = sum over predecessors v of v.hub
   normalize hub and authority scores so that the sum of the squares of each = 1
 until scores converge
 
HITS is somewhat different from random walk/eigenvector-based algorithms such as PageRank in that:
  • there are two mutually recursive scores being calculated, rather than a single value
  • the edge weights are effectively all 1, i.e., they can't be interpreted as transition probabilities. This means that the more inlinks and outlinks that a vertex has, the better, since adding an inlink (or outlink) does not dilute the influence of the other inlinks (or outlinks) as in random walk-based algorithms.
  • the scores cannot be interpreted as posterior probabilities (due to the different normalization)
This implementation has the classic behavior by default. However, it has been generalized somewhat so that it can act in a more "PageRank-like" fashion:
  • this implementation has an optional 'random jump probability' parameter analogous to the 'alpha' parameter used by PageRank. Varying this value between 0 and 1 allows the user to vary between the classic HITS behavior and one in which the scores are smoothed to a uniform distribution. The default value for this parameter is 0 (no random jumps possible).
  • the edge weights can be set to anything the user likes, and in particular they can be set up (e.g. using UniformDegreeWeight) so that the weights of the relevant edges incident to a vertex sum to 1.
  • The vertex score normalization has been factored into its own method so that it can be overridden by a subclass. Thus, for example, since the vertices' values are set to sum to 1 initially, if the weights of the relevant edges incident to a vertex sum to 1, then the vertices' values will continue to sum to 1 if the "sum-of-squares" normalization code is overridden to a no-op. (Other normalization methods may also be employed.)
See Also:
  • "'Authoritative sources in a hyperlinked environment' by Jon Kleinberg, 1997"
  • Constructor Details

    • HITS

      public HITS(Graph<V,E> g, com.google.common.base.Function<E,Double> edge_weights, double alpha)
      Creates an instance for the specified graph, edge weights, and alpha (random jump probability) parameter.
      Parameters:
      g - the input graph
      edge_weights - the weights to use for each edge
      alpha - the probability of a hub giving some authority to all vertices, and of an authority increasing the score of all hubs (not just those connected via links)
    • HITS

      public HITS(Graph<V,E> g, double alpha)
      Creates an instance for the specified graph and alpha (random jump probability) parameter. The edge weights are all set to 1.
      Parameters:
      g - the input graph
      alpha - the probability of a hub giving some authority to all vertices, and of an authority increasing the score of all hubs (not just those connected via links)
    • HITS

      public HITS(Graph<V,E> g)
      Creates an instance for the specified graph. The edge weights are all set to 1 and alpha is set to 0.
      Parameters:
      g - the input graph