Class DBSCANClusterer<T extends Clusterable>

  • Type Parameters:
    T - type of the points to cluster

    public class DBSCANClusterer<T extends Clusterable>
    extends Clusterer<T>
    DBSCAN (density-based spatial clustering of applications with noise) algorithm.

    The DBSCAN algorithm forms clusters based on the idea of density connectivity, i.e. a point p is density connected to another point q, if there exists a chain of points pi, with i = 1 .. n and p1 = p and pn = q, such that each pair <pi, pi+1> is directly density-reachable. A point q is directly density-reachable from point p if it is in the ε-neighborhood of this point.

    Any point that is not density-reachable from a formed cluster is treated as noise, and will thus not be present in the result.

    The algorithm requires two parameters:

    • eps: the distance that defines the ε-neighborhood of a point
    • minPoints: the minimum number of density-connected points required to form a cluster
    Since:
    3.2
    See Also:
    DBSCAN (wikipedia), A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      private static class  DBSCANClusterer.PointStatus
      Status of a point during the clustering process.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private double eps
      Maximum radius of the neighborhood to be considered.
      private int minPts
      Minimum number of points needed for a cluster.
    • Constructor Summary

      Constructors 
      Constructor Description
      DBSCANClusterer​(double eps, int minPts)
      Creates a new instance of a DBSCANClusterer.
      DBSCANClusterer​(double eps, int minPts, DistanceMeasure measure)
      Creates a new instance of a DBSCANClusterer.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.List<Cluster<T>> cluster​(java.util.Collection<T> points)
      Performs DBSCAN cluster analysis.
      private Cluster<T> expandCluster​(Cluster<T> cluster, T point, java.util.List<T> neighbors, java.util.Collection<T> points, java.util.Map<Clusterable,​DBSCANClusterer.PointStatus> visited)
      Expands the cluster to include density-reachable items.
      double getEps()
      Returns the maximum radius of the neighborhood to be considered.
      int getMinPts()
      Returns the minimum number of points needed for a cluster.
      private java.util.List<T> getNeighbors​(T point, java.util.Collection<T> points)
      Returns a list of density-reachable neighbors of a point.
      private java.util.List<T> merge​(java.util.List<T> one, java.util.List<T> two)
      Merges two lists together.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • eps

        private final double eps
        Maximum radius of the neighborhood to be considered.
      • minPts

        private final int minPts
        Minimum number of points needed for a cluster.
    • Constructor Detail

      • DBSCANClusterer

        public DBSCANClusterer​(double eps,
                               int minPts)
                        throws NotPositiveException
        Creates a new instance of a DBSCANClusterer.

        The euclidean distance will be used as default distance measure.

        Parameters:
        eps - maximum radius of the neighborhood to be considered
        minPts - minimum number of points needed for a cluster
        Throws:
        NotPositiveException - if eps < 0.0 or minPts < 0
      • DBSCANClusterer

        public DBSCANClusterer​(double eps,
                               int minPts,
                               DistanceMeasure measure)
                        throws NotPositiveException
        Creates a new instance of a DBSCANClusterer.
        Parameters:
        eps - maximum radius of the neighborhood to be considered
        minPts - minimum number of points needed for a cluster
        measure - the distance measure to use
        Throws:
        NotPositiveException - if eps < 0.0 or minPts < 0
    • Method Detail

      • getEps

        public double getEps()
        Returns the maximum radius of the neighborhood to be considered.
        Returns:
        maximum radius of the neighborhood
      • getMinPts

        public int getMinPts()
        Returns the minimum number of points needed for a cluster.
        Returns:
        minimum number of points needed for a cluster
      • expandCluster

        private Cluster<T> expandCluster​(Cluster<T> cluster,
                                         T point,
                                         java.util.List<T> neighbors,
                                         java.util.Collection<T> points,
                                         java.util.Map<Clusterable,​DBSCANClusterer.PointStatus> visited)
        Expands the cluster to include density-reachable items.
        Parameters:
        cluster - Cluster to expand
        point - Point to add to cluster
        neighbors - List of neighbors
        points - the data set
        visited - the set of already visited points
        Returns:
        the expanded cluster
      • getNeighbors

        private java.util.List<T> getNeighbors​(T point,
                                               java.util.Collection<T> points)
        Returns a list of density-reachable neighbors of a point.
        Parameters:
        point - the point to look for
        points - possible neighbors
        Returns:
        the List of neighbors
      • merge

        private java.util.List<T> merge​(java.util.List<T> one,
                                        java.util.List<T> two)
        Merges two lists together.
        Parameters:
        one - first list
        two - second list
        Returns:
        merged lists