Aggregation

Edit this page

To aggregate data in Vega-Lite, users can either use the aggregate property of an encoding field definition or the aggregate transform inside the transform array. Aggregate summarizes a table as one record for each group. To preserve the original table structure and instead add a new column with the aggregate values, use the join aggregate transform.

Documentation Overview

Aggregate in Encoding Field Definition
Aggregate Transform
- Aggregated Field Definition for Aggregate Transform
Supported Aggregation Operations
Argmin / Argmax
- Example: Labeling Line Chart

Aggregate in Encoding Field Definition

// A Single View or a Layer Specification
{
  ...,
  "mark/layer": ...,
  "encoding": {
    "x": {
      "aggregate": ..., // aggregate
      "field": ...,
      "type": "quantitative",
      ...
    },
    "y": ...,
    ...
  },
  ...
}

The aggregate property of a field definition can be used to compute aggregate summary statistics (e.g., median, min, max) over groups of data.

If at least one fields in the specified encoding channels contain aggregate, the resulting visualization will show aggregate data. In this case, all fields without aggregation function specified are treated as group-by fields¹ in the aggregation process.

For example, the following bar chart aggregates mean of Acceleration, grouped by the number of Cylinders.

Note: aggregated fields are quantitative by default while unaggregated (group by) fields in aggregated encodings are nominal by default.

The detail channel can be used to specify additional summary and group-by fields without mapping the field(s) to any visual properties. For example, the following plots add Origin as a group by field.

¹The group-by fields are also known as independent/condition variables in statistics and dimensions in Business Intelligence. Similarly, the aggregate fields are known as dependent variables and measures.

Aggregate Transform

// Any View Specification
{
  ...
  "transform": [
    {
      // Aggregate Transform
      "aggregate": [{"op": ..., "field": ..., "as": ...}],
      "groupby": [...]
    }
     ...
  ],
  ...
}

For example, here is the same bar chart which aggregates mean of Acceleration, grouped by the number of Cylinders, but this time using the aggregate property as part of the transform.

An aggregate transform in the transform array has the following properties:

Property	Type	Description
aggregate	AggregatedFieldDef[]	*Required.* Array of objects that define fields to aggregate.
groupby	String[]	The data fields to group by. If not specified, a single group containing all data objects will be used.

Aggregated Field Definition for Aggregate Transform

Property	Type	Description
op	String	*Required.* The aggregation operation to apply to the fields (e.g., `"sum"`, `"average"`, or `"count"`). See the full list of supported aggregation operations for more information.
field	String	The data field for which to compute aggregate function. This is required for all aggregation operations except `"count"`.
as	String	*Required.* The output field names to use for each aggregated field.

Note: It is important you parse your data types explicitly, especially if you are likely to have null values in your dataset and automatic type inference will fail.

Supported Aggregation Operations

The supported aggregation operations are:

Operation	Description
count	The total count of data objects in the group. Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.
valid	The count of field values that are not `null`, `undefined` or `NaN`.
values	A list of data objects in the group.
missing	The count of `null` or `undefined` field values.
distinct	The count of distinct field values.
sum	The sum of field values.
product	The product of field values.
mean	The mean (average) field value.
average	The mean (average) field value. Identical to mean.
variance	The sample variance of field values.
variancep	The population variance of field values.
stdev	The sample standard deviation of field values.
stdevp	The population standard deviation of field values.
stderr	The standard error of field values.
median	The median field value.
q1	The lower quartile boundary of field values.
q3	The upper quartile boundary of field values.
ci0	The lower boundary of the bootstrapped 95% confidence interval of the mean field value.
ci1	The upper boundary of the bootstrapped 95% confidence interval of the mean field value.
min	The minimum field value.
max	The maximum field value.
argmin	An input data object containing the minimum field value. Note: When used inside encoding, `argmin` must be specified as an object. (See below for an example.)
argmax	An input data object containing the maximum field value. Note: When used inside encoding, `argmax` must be specified as an object. (See below for an example.)

Argmin / Argmax

Sometimes, you may not want to find the minimum or maximum of a field, but instead the value from a field that corresponds to the minimum or maximum value in another field. In these cases you can use the argmin and argmax aggregates.

The argmax and argmin operation can be specified in an encoding field definition by setting aggregate to an object with argmax/min describing the field to maximize/minimize. For example, the following plot shows the production budget of the movie that has the highest US Gross in each major genre.

This is equivalent to specifying argmax in an aggregate transform and encode its nested data.

Example: Labeling Line Chart

argmax can be useful for getting the last value in a line for label placement.