Impute
Edit this pageTo impute missing data in Vega-Lite, you can either use the impute
transform, either via an encoding field definition or via an transform
array.
The impute transform groups data and determines missing values of the key
field within each group. For each missing value in each group, the impute transform will produce a new tuple with the impute
d field generated based on a specified imputation method
(by using a constant value
or by calculating statistics such as mean within each group).
Documentation Overview
Impute in Encoding Field Definition
// A Single View or a Layer Specification
{
...,
"mark/layer": ...,
"encoding": {
"x": {
"field": ...,
"type": "quantitative",
"impute": {...}, // Impute
...
},
"y": ...,
...
},
...
}
An encoding field definition can include an impute
definition object to generate new data objects in place of the missing data.
The impute
definition object can contain the following properties:
Property | Type | Description |
---|---|---|
frame | [Null | Number, Null | Number] |
A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value Default value:: |
keyvals | Any[] | ImputeSequence |
Defines the key values that should be considered for imputation. An array of key values or an object defining a number sequence. If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the If there is no impute grouping, this property must be specified. |
method | String |
The imputation method to use for the field value of imputed data objects. One of Default value: |
value | Any |
The field value to use when the imputation |
For impute
in encoding, the grouping fields and the key field (for identifying missing values) are automatically determined. Values are automatically grouped by the specified fields of mark property channels, key channel and detail channel. If x-field is impute
d, y-field is the key field. Basically, any missing y-value in each group will lead to a new tuple imputed, and vice versa.
In this example, we impute
the y
-field ("b"
), so the x
-field ("a"
) will be used as the "key"
. The values are then grouped by the field "c"
of the color
encoding. The impute transform then determines missing key values within each group. In this case, the data tuple where "a"
is 3
and "c"
is 1
is missing, so a new tuple with "a"
= 3
, "c"
= 1
, and "b"
= 0
will be added.
Besides imputing with a constant value
, we can also use a method
(such as "mean"
) on existing data points to generate the missing data.
The frame
property of impute
can be used to control the window over which the specified method
is applied. Here, the frame
is [-2, 2]
which indicates that the window for calculating mean includes two objects preceding and two objects following the current object.
Specifying the Key Values to be Imputed
The keyvals
property provides additional key values that should be considered for imputation. If not provided, all of the values will be derived from all unique values of the key
field. If there is no grouping field (e.g., no color
in the examples given above), then keyvals
must be specified. Otherwise, the impute transform will have no effect on the data.
The keyvals
property can be an array:
Alternatively, the keyvals
property can be an object defining a sequence, which can contain the following properties:
Property | Type | Description |
---|---|---|
start | Number |
The starting value of the sequence. Default value: |
stop | Number |
Required. The ending value(exclusive) of the sequence. |
step | Number |
The step value between sequence entries. Default value: |
Impute Transform
An impute transform can also be specified as a part of the transform
array.
// A View Specification
{
...
"transform": [
...
{
// Impute Transform
"impute": ...,
"key": ...,
"keyvals": ...,
"groupby": [...],
"frame": [...],
"method": ...,
"value": ...
}
...
],
...
}
Property | Type | Description |
---|---|---|
impute | String |
Required. The data field for which the missing values should be imputed. |
key | String |
Required. A key field that uniquely identifies data objects within a group. Missing key values (those occurring in the data but not in the current group) will be imputed. |
keyvals | Any[] | ImputeSequence |
Defines the key values that should be considered for imputation. An array of key values or an object defining a number sequence. If provided, this will be used in addition to the key values observed within the input data. If not provided, the values will be derived from all unique values of the If there is no impute grouping, this property must be specified. |
groupby | String[] |
An optional array of fields by which to group the values. Imputation will then be performed on a per-group basis. |
frame | [Null | Number, Null | Number] |
A frame specification as a two-element array used to control the window over which the specified method is applied. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. For example, the value Default value:: |
method | String |
The imputation method to use for the field value of imputed data objects. One of Default value: |
value | Any |
The field value to use when the imputation |
For example, the same chart with impute
in encoding above can be created using the impute
transform. Here, we have to manually specify the key
and groupby
fields, which were inferred automatically for impute
in encoding
.
Similarly keyvals
must be specified if the groupby
property is not specified.