KDE Transform

The kde transform ≥ 5.4 performs one-dimensional kernel density estimation over an input data stream and generates uniformly-spaced samples of the estimated densities. Unlike the related density transform, this transform supports groupby functionality and also scaling of densities to convey either probabilities or smoothed counts.

Transform Parameters

Property Type Description
field Field Required. The data field for which to perform density estimation.
groupby Field[ ] The data fields to group by. If not specified, a single group containing all data objects will be used.
cumulative Boolean A boolean flag indicating whether to produce density estimates (false, default) or cumulative density estimates (true).
counts Boolean A boolean flag indicating if the output values should be probability estimates (false, default) or smoothed counts (true).
bandwidth Number An optional parameter that determines the width of the Gaussian kernel. If set to 0 (the default), the bandwidth value is automatically estimated from the input data using Scott’s method.
extent Number[ ] A [min, max] domain from which to sample the distribution. If unspecified, the extent will be determined by the minimum and maximum values of the observed value field.
minsteps Number The minimum number of samples (default 25) to take along the extent domain for plotting the density.
maxsteps Number The maximum number of samples (default 200) to take along the extent domain for plotting the density.
steps Number The exact number of samples to take along the extent domain for plotting the density. If specified, overrides both minsteps and maxsteps to set an exact number of uniform samples. Potentially useful in conjunction with a fixed extent to ensure consistent sample points for stacked densities.
as String[ ] The output fields for the sample value and associated probability. The default is ["value", "density"].

Usage

Performs kernel density estimation (with automatically-selected bandwidth) over the numbers in the field value in the input data stream, with separate density estimates across groups defined by the key field:

{
  "type": "kde",
  "groupby": ["key"],
  "field": "value"
}