Class Summary

All Implemented Interfaces:
Collector.Describable

public class Summary extends SimpleCollector<Summary.Child> implements Collector.Describable
Summary metrics and Histogram metrics can both be used to monitor distributions like latencies or request sizes.

An overview of when to use Summaries and when to use Histograms can be found on https://prometheus.io/docs/practices/histograms.

The following example shows how to measure latencies and request sizes:

 class YourClass {

   private static final Summary requestLatency = Summary.build()
       .name("requests_latency_seconds")
       .help("request latency in seconds")
       .register();

   private static final Summary receivedBytes = Summary.build()
       .name("requests_size_bytes")
       .help("request size in bytes")
       .register();

   public void processRequest(Request req) {
     Summary.Timer requestTimer = requestLatency.startTimer();
     try {
       // Your code here.
     } finally {
       requestTimer.observeDuration();
       receivedBytes.observe(req.size());
     }
   }
 }
 
The Summary class provides different utility methods for observing values, like observe(double), startTimer() and Summary.Timer.observeDuration(), time(Callable), etc.

By default, Summary metrics provide the count and the sum. For example, if you measure latencies of a REST service, the count will tell you how often the REST service was called, and the sum will tell you the total aggregated response time. You can calculate the average response time using a Prometheus query dividing sum / count.

In addition to count and sum, you can configure a Summary to provide quantiles:

 Summary requestLatency = Summary.build()
     .name("requests_latency_seconds")
     .help("Request latency in seconds.")
     .quantile(0.5, 0.01)    // 0.5 quantile (median) with 0.01 allowed error
     .quantile(0.95, 0.005)  // 0.95 quantile with 0.005 allowed error
     // ...
     .register();
 
As an example, a 0.95 quantile of 120ms tells you that 95% of the calls were faster than 120ms, and 5% of the calls were slower than 120ms.

Tracking exact quantiles require a large amount of memory, because all observations need to be stored in a sorted list. Therefore, we allow an error to significantly reduce memory usage.

In the example, the allowed error of 0.005 means that you will not get the exact 0.95 quantile, but anything between the 0.945 quantile and the 0.955 quantile.

Experiments show that the Summary typically needs to keep less than 100 samples to provide that precision, even if you have hundreds of millions of observations.

There are a few special cases:

  • You can set an allowed error of 0, but then the Summary will keep all observations in memory.
  • You can track the minimum value with .quantile(0.0, 0.0). This special case will not use additional memory even though the allowed error is 0.
  • You can track the maximum value with .quantile(1.0, 0.0). This special case will not use additional memory even though the allowed error is 0.
Typically, you don't want to have a Summary representing the entire runtime of the application, but you want to look at a reasonable time interval. Summary metrics implement a configurable sliding time window:
 Summary requestLatency = Summary.build()
     .name("requests_latency_seconds")
     .help("Request latency in seconds.")
     .maxAgeSeconds(10 * 60)
     .ageBuckets(5)
     // ...
     .register();
 
The default is a time window of 10 minutes and 5 age buckets, i.e. the time window is 10 minutes wide, and we slide it forward every 2 minutes.
  • Field Details

    • quantiles

      final List<CKMSQuantiles.Quantile> quantiles
    • maxAgeSeconds

      final long maxAgeSeconds
    • ageBuckets

      final int ageBuckets
  • Constructor Details

  • Method Details

    • build

      public static Summary.Builder build(String name, String help)
      Return a Builder to allow configuration of a new Summary. Ensures required fields are provided.
      Parameters:
      name - The name of the metric
      help - The help string of the metric
    • build

      public static Summary.Builder build()
      Return a Builder to allow configuration of a new Summary.
    • newChild

      protected Summary.Child newChild()
      Description copied from class: SimpleCollector
      Return a new child, workaround for Java generics limitations.
      Specified by:
      newChild in class SimpleCollector<Summary.Child>
    • observe

      public void observe(double amt)
      Observe the given amount on the summary with no labels.
      Parameters:
      amt - in most cases amt should be >= 0. Negative values are supported, but you should read https://prometheus.io/docs/practices/histograms/#count-and-sum-of-observations for implications and alternatives.
    • startTimer

      public Summary.Timer startTimer()
      Start a timer to track a duration on the summary with no labels.

      Call Summary.Timer.observeDuration() at the end of what you want to measure the duration of.

    • time

      public double time(Runnable timeable)
      Executes runnable code (e.g. a Java 8 Lambda) and observes a duration of how long it took to run.
      Parameters:
      timeable - Code that is being timed
      Returns:
      Measured duration in seconds for timeable to complete.
    • time

      public <E> E time(Callable<E> timeable)
      Executes callable code (e.g. a Java 8 Lambda) and observes a duration of how long it took to run.
      Parameters:
      timeable - Code that is being timed
      Returns:
      Result returned by callable.
    • get

      public Summary.Child.Value get()
      Get the value of the Summary.

      Warning: The definition of Summary.Child.Value is subject to change.

    • collect

      Description copied from class: Collector
      Return all metrics of this Collector.
      Specified by:
      collect in class Collector
    • describe

      public List<Collector.MetricFamilySamples> describe()
      Description copied from interface: Collector.Describable
      Provide a list of metric families this Collector is expected to return. These should exclude the samples. This is used by the registry to detect collisions and duplicate registrations. Usually custom collectors do not have to implement Describable. If Describable is not implemented and the CollectorRegistry was created with auto describe enabled (which is the case for the default registry) then Collector.collect() will be called at registration time instead of describe. If this could cause problems, either implement a proper describe, or if that's not practical have describe return an empty list.
      Specified by:
      describe in interface Collector.Describable