Density Transform

The density transform generates a new data stream of uniformly-spaced samples drawn from a one-dimensional probability density function (pdf) or cumulative distribution function (cdf). This transform is useful for representing probability distributions and generating continuous distributions from discrete samples using kernel density estimation.

Transform Parameters

Property Type Description
distribution Distribution Required. An object describing the distribution type and parameters. See the distribution reference for more.
extent Number[ ] A [min, max] domain from which to sample the distribution. This argument is required in most cases, but can be omitted in the case of distributions (namely, kde) that can deduce their own extent.
method String The type of distribution to generate. One of pdf (default) or cdf.
steps Number The number of uniformly spaced steps to take along the extent domain (default 100). A total of steps + 1 uniformly-spaced samples are drawn from the distribution.
as String[ ] The output fields for the sample value and associated probability. The default is ["value", "density"].

Distribution Reference

# normal

Represents a normal (Gaussian) probability distribution with a specified mean and standard deviation stdev.

Property Type Description
function String The value "normal".
mean Number The mean of the distribution (default 0).
stdev Number The standard deviation of the distribution (default 1).

# uniform

Represents a continuous uniform probability distribution over the interval [min, max).

Property Type Description
function String The value "uniform".
min Number The minimum value (default 0).
max Number The maximum value (default 1).

# kde

Represents a kernel density estimate for a set of numerical values. This method uses a Gaussian kernel to estimate a smoothed, continuous probability distribution.

Property Type Description
function String The value "kde".
from Data The name of the data set to analyze.
field Field The data field containing the values to model.
bandwidth Number An optional parameter that determines the width of the Gaussian kernel. If set to 0 (the default), the bandwidth value will be automatically estimated from the input data.

# mixture

Represents a (weighted) mixture of probability distributions. The distributions argument should be an array of distribution objects. The optional weights array provides proportional numerical weights for each distribution.

Property Type Description
function String The value "mixture".
distributions Distribution[ ] An array of distribution definition objects.
weights Number[ ] An optional array of weights for each distribution.If provided, the values will be normalized to ensure that weights sum to 1. Any unspecified weight values default to 1 prior to normalization.

Usage

{
  "type": "density",
  "extent": [0, 10],
  "distribution": {
    "function": "normal",
    "mean": 5,
    "stdev": 2
  }
}

Generates a data stream of data objects drawn from a normal distribution with mean 5 and standard deviation 2, sampling 100 steps along the domain [0, 10].

{
  "type": "density",
  "steps": 200,
  "distribution": {
    "function": "kde",
    "from": "table",
    "field": "value"
  }
}

Performs kernel density estimation (with automatically-selected bandwidth) over the numbers in the field value in the data stream named table. Generates a data stream by drawing 200 uniformly-spaced samples between the minimum and maximum observed data value.