Impute

Edit this page

To impute missing data in Vega-Lite, you can either use the impute transform, either via an encoding field definition or via an transform array.

The impute transform groups data and determines missing values of the key field within each group. For each missing value in each group, the impute transform will produce a new tuple with the imputed field generated based on a specified imputation method (by using a constant value or by calculating statistics such as mean within each group).

Documentation Overview

Impute in Encoding Field Definition

// A Single View or a Layer Specification
{
  ...,
  "mark/layer": ...,
  "encoding": {
    "x": {
      "field": ...,
      "type": "quantitative",
      "impute": {...},               // Impute
      ...
    },
    "y": ...,
    ...
  },
  ...
}

An encoding field definition can include an impute definition object to generate new data objects in place of the missing data.

The impute definition object can contain the following properties:

Property Type Description
frame (Null | Number)[]

A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value [-5, 5] indicates that the window should include five objects preceding and five objects following the current object.

Default value:: [null, null] indicating that the window includes all objects.

keyvals Any[] | ImputeSequence

Defines the key values that should be considered for imputation. An array of key values or an object defining a number sequence.

If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the key field. For impute in encoding, the key field is the x-field if the y-field is imputed, or vice versa.

If there is no impute grouping, this property must be specified.

method String

The imputation method to use for the field value of imputed data objects. One of "value", "mean", "median", "max" or "min".

Default value: "value"

value Any

The field value to use when the imputation method is "value".

For impute in encoding, the grouping fields and the key field (for identifying missing values) are automatically determined. Values are automatically grouped by the specified fields of mark property channels, key channel and detail channel. If x-field is imputed, y-field is the key field. Basically, any missing y-value in each group will lead to a new tuple imputed, and vice versa.

In this example, we impute the y-field ("b"), so the x-field ("a") will be used as the "key". The values are then grouped by the field "c" of the color encoding. The impute transform then determines missing key values within each group. In this case, the data tuple where "a" is 3 and "c" is 1 is missing, so a new tuple with "a" = 3, "c" = 1, and "b" = 0 will be added.

Besides imputing with a constant value, we can also use a method (such as "mean") on existing data points to generate the missing data.

The frame property of impute can be used to control the window over which the specified method is applied. Here, the frame is [-2, 2] which indicates that the window for calculating mean includes two objects preceding and two objects following the current object.

Specifying the Key Values to be Imputed

The keyvals property provides additional key values that should be considered for imputation. If not provided, all of the values will be derived from all unique values of the key field. If there is no grouping field (e.g., no color in the examples given above), then keyvals must be specified. Otherwise, the impute transform will have no effect on the data.

The keyvals property can be an array:

Alternatively, the keyvals property can be an object defining a sequence, which can contain the following properties:

Property Type Description
start Number

The starting value of the sequence. Default value: 0

stop Number

Required. The ending value(exclusive) of the sequence.

step Number

The step value between sequence entries. Default value: 1 or -1 if stop < start

Impute Transform

An impute transform can also be specified as a part of the transform array.

// A View Specification
{
  ...
  "transform": [
    ...
    {
      // Impute Transform
      "impute": ...,
      "key": ...,
      "keyvals": ...,
      "groupby": [...],
      "frame": [...],
      "method": ...,
      "value": ...
    }
    ...
  ],
  ...
}
Property Type Description
impute String

Required. The data field for which the missing values should be imputed.

key String

Required. A key field that uniquely identifies data objects within a group. Missing key values (those occurring in the data but not in the current group) will be imputed.

keyvals Any[] | ImputeSequence

Defines the key values that should be considered for imputation. An array of key values or an object defining a number sequence.

If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the key field. For impute in encoding, the key field is the x-field if the y-field is imputed, or vice versa.

If there is no impute grouping, this property must be specified.

groupby String[]

An optional array of fields by which to group the values. Imputation will then be performed on a per-group basis.

frame (Null | Number)[]

A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value [-5, 5] indicates that the window should include five objects preceding and five objects following the current object.

Default value:: [null, null] indicating that the window includes all objects.

method String

The imputation method to use for the field value of imputed data objects. One of "value", "mean", "median", "max" or "min".

Default value: "value"

value Any

The field value to use when the imputation method is "value".

For example, the same chart with impute in encoding above can be created using the impute transform. Here, we have to manually specify the key and groupby fields, which were inferred automatically for impute in encoding.

Similarly keyvals must be specified if the groupby property is not specified.