Class ArrayDigest


public class ArrayDigest extends AbstractTDigest
Array based implementation of a TDigest.

This implementation is essentially a one-level b-tree in which nodes are collected into pages typically with 32 values per page. Commonly, an ArrayDigest contains 500-3000 centroids. With 32 values per page, we have about 32 values per page and about 30 pages which seems to give a nice balance for speed. Sizes from 4 to 100 are plausible, however.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
     
    static final int
     
    static final int
     
    static final int
     

    Fields inherited from class com.tdunning.math.stats.AbstractTDigest

    gen, recordAllData
  • Constructor Summary

    Constructors
    Constructor
    Description
    ArrayDigest(int pageSize, double compression)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    add(double x, int w)
    Adds a sample to a histogram.
    Iterator<com.tdunning.math.stats.ArrayDigest.Index>
    allAfter(double x)
     
    Iterator<com.tdunning.math.stats.ArrayDigest.Index>
    allBefore(double x)
    Returns an iterator which will give each element invalid input: '<'= to x in non-increasing order.
    void
    Outputs a histogram as bytes using a particularly cheesy encoding.
    void
    Serialize this TDigest into a byte buffer.
    int
    Returns an upper bound on the number bytes that will be required to represent this histogram.
    double
    cdf(double x)
    Returns the fraction of all points added which are invalid input: '<'= x.
    com.tdunning.math.stats.ArrayDigest.Index
    ceiling(double x)
     
    int
    The number of centroids currently in the TDigest.
    Iterable<? extends Centroid>
    An iterable that lets you go through the centroids in ascending order by mean.
    void
    Re-examines a t-digest to determine whether some centroids are redundant.
    void
     
    double
    Returns the current compression factor.
    int
    count(com.tdunning.math.stats.ArrayDigest.Index index)
     
    com.tdunning.math.stats.ArrayDigest.Index
    floor(double x)
    Returns a cursor pointing to the first element invalid input: '<'= x.
    Reads a histogram from a byte buffer
    long
    headSum(com.tdunning.math.stats.ArrayDigest.Index limit)
     
    com.tdunning.math.stats.ArrayDigest.Index
    increment(com.tdunning.math.stats.ArrayDigest.Index x, int delta)
     
    double
    mean(com.tdunning.math.stats.ArrayDigest.Index index)
     
    double
    quantile(double q)
    Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
    long
    Returns the number of points that have been added to this TDigest.
    int
    Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.

    Methods inherited from class com.tdunning.math.stats.AbstractTDigest

    add, add, createCentroid, decode, encode, interpolate, isRecording, merge, recordAllData

    Methods inherited from class com.tdunning.math.stats.TDigest

    checkValue, createArrayDigest, createArrayDigest, createTreeDigest

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • ArrayDigest

      public ArrayDigest(int pageSize, double compression)
  • Method Details

    • add

      public void add(double x, int w)
      Description copied from class: TDigest
      Adds a sample to a histogram.
      Specified by:
      add in class TDigest
      Parameters:
      x - The value to add.
      w - The weight of this point.
    • headSum

      public long headSum(com.tdunning.math.stats.ArrayDigest.Index limit)
    • mean

      public double mean(com.tdunning.math.stats.ArrayDigest.Index index)
    • count

      public int count(com.tdunning.math.stats.ArrayDigest.Index index)
    • compress

      public void compress()
      Description copied from class: TDigest
      Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.

      The cost is roughly the same as adding as many data points as there are centroids. This is typically invalid input: '<' 10 * compression, but could be as high as 100 * compression.

      This is a destructive operation that is not thread-safe.

      Specified by:
      compress in class TDigest
    • compress

      public void compress(GroupTree other)
      Specified by:
      compress in class AbstractTDigest
    • size

      public long size()
      Description copied from class: TDigest
      Returns the number of points that have been added to this TDigest.
      Specified by:
      size in class TDigest
      Returns:
      The sum of the weights on all centroids.
    • cdf

      public double cdf(double x)
      Description copied from class: TDigest
      Returns the fraction of all points added which are invalid input: '<'= x.
      Specified by:
      cdf in class TDigest
    • quantile

      public double quantile(double q)
      Description copied from class: TDigest
      Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
      Specified by:
      quantile in class TDigest
      Parameters:
      q - The desired fraction
      Returns:
      The value x such that cdf(x) == q
    • centroidCount

      public int centroidCount()
      Description copied from class: TDigest
      The number of centroids currently in the TDigest.
      Specified by:
      centroidCount in class TDigest
      Returns:
      The number of centroids
    • centroids

      public Iterable<? extends Centroid> centroids()
      Description copied from class: TDigest
      An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
      Specified by:
      centroids in class TDigest
      Returns:
      The centroids in the form of an Iterable.
    • allAfter

      public Iterator<com.tdunning.math.stats.ArrayDigest.Index> allAfter(double x)
    • floor

      public com.tdunning.math.stats.ArrayDigest.Index floor(double x)
      Returns a cursor pointing to the first element invalid input: '<'= x. Exposed only for testing.
      Parameters:
      x - The value used to find the cursor.
      Returns:
      The cursor.
    • ceiling

      public com.tdunning.math.stats.ArrayDigest.Index ceiling(double x)
    • allBefore

      public Iterator<com.tdunning.math.stats.ArrayDigest.Index> allBefore(double x)
      Returns an iterator which will give each element invalid input: '<'= to x in non-increasing order.
      Parameters:
      x - The upper bound of all returned elements
      Returns:
      An iterator that returns elements in non-increasing order.
    • increment

      public com.tdunning.math.stats.ArrayDigest.Index increment(com.tdunning.math.stats.ArrayDigest.Index x, int delta)
    • compression

      public double compression()
      Description copied from class: TDigest
      Returns the current compression factor.
      Specified by:
      compression in class TDigest
      Returns:
      The compression factor originally used to set up the TDigest.
    • byteSize

      public int byteSize()
      Returns an upper bound on the number bytes that will be required to represent this histogram.
      Specified by:
      byteSize in class TDigest
      Returns:
      The number of bytes required.
    • smallByteSize

      public int smallByteSize()
      Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
      Specified by:
      smallByteSize in class TDigest
      Returns:
      The number of bytes required.
    • asBytes

      public void asBytes(ByteBuffer buf)
      Outputs a histogram as bytes using a particularly cheesy encoding.
      Specified by:
      asBytes in class TDigest
      Parameters:
      buf - The byte buffer into which the TDigest should be serialized.
    • asSmallBytes

      public void asSmallBytes(ByteBuffer buf)
      Description copied from class: TDigest
      Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
      Specified by:
      asSmallBytes in class TDigest
      Parameters:
      buf - The byte buffer into which the TDigest should be serialized.
    • fromBytes

      public static ArrayDigest fromBytes(ByteBuffer buf)
      Reads a histogram from a byte buffer
      Returns:
      The new histogram structure