Regression Transform

The regression transform ≥ 5.4 fits two-dimensional regression models to smooth and predict data. This transform can fit multiple models for input data (one per group) and generates new data objects that represent points for summary trend lines. Alternatively, this transform can be used to generate a set of objects containing regression model parameters, one per group.

This transform supports parametric models for the following functional forms:

  • linear (linear): y = a + b * x
  • logarithmic (log): y = a + b * log(x)
  • exponential (exp): y = a + eb * x
  • power (pow): y = a * xb
  • quadratic (quad): y = a + b * x + c * x2
  • polynomial (poly): y = a + b * x + … + k * xorder

All models are fit using ordinary least squares. For non-parametric locally weighted regression, see the loess transform.

Transform Parameters

Property Type Description
x Field Required. The data field for predictor (independent) values, typically associated with the x-axis.
y Field Required. The data field for predicted (dependent) values, typically associated with the y-axis.
groupby Field[ ] The data fields to group by. If not specified, a single group containing all data objects will be used.
method String The type of regression model to use. One of linear (default), log, exp, pow, quad, or poly.
order Number The polynomial order (number of coefficients) for the poly method.
extent Number[ ] A [min, max] domain over the x field specifying the starting and ending points of the generated trend line.
params Boolean A boolean flag indicating if the transform should return the fit model parameters (one object per group), rather than trend line points. The resulting objects include a coef array of fitted coefficient values (starting with the intercept term and then including terms of increasing order) and an rSquared value (indicating the total variance explained by the model).
as String The output fields for the predictor and predicted values for the line of best fit. If unspecified, the x and y parameter field names will be used.

Usage

Linear Regression

Fit a linear regression model that predicts the field dv as a function of iv. Generates a new data stream with points for a regression line that extends from -5 to 5 over the domain of iv:

{
  "type": "regression",
  "method": "linear",
  "x": "dv",
  "y": "iv",
  "extent": [-5, 5]
}

The resulting points can then be visualized with a line mark.

Model Parameters

Fit a fourth-order polynomial regression model that predicts the field dv as a function of iv, with separate models for each value of the groupby field key:

{
  "type": "regression",
  "method": "poly",
  "groupby": ["key"],
  "x": "dv",
  "y": "iv",
  "order": 4,
  "params": true
}

By setting params to true, instead of trend line points this example returns an object with model parameter values for each group with the fields coef (an array of fitted model coefficients) and rSquared (the coefficient of determination indicating the amount of variance explained by the model).