Aggregation

Edit this page

To aggregate data in Vega-Lite, users can either use the aggregate property of an encoding field definition or the aggregate transform inside the transform array.

Documentation Overview

Aggregate in Encoding Field Definition

// A Single View Specification
{
  "data": ... ,
  "mark": ... ,
  "encoding": {
    "x": {
      "aggregate": ...,               // aggregate
      "field": ...,
      "type": "quantitative",
      ...
    },
    "y": ...,
    ...
  },
  ...
}

The aggregate property of a field definition can be used to compute aggregate summary statistics (e.g., median, min, max) over groups of data.

If at least one fields in the specified encoding channels contain aggregate, the resulting visualization will show aggregate data. In this case, all fields without aggregation function specified are treated as group-by fields1 in the aggregation process.

For example, the following bar chart aggregates mean of Acceleration, grouped by the number of Cylinders.

The detail channel can be used to specify additional summary and group-by fields without mapping the field(s) to any visual properties. For example, the following plots add Origin as a group by field.

1The group-by fields are also known as independent/condition variables in statistics and dimensions in Business Intelligence. Similarly, the aggregate fields are known as dependent variables and measures.

Aggregate Transform

// A View Specification
{
  ...
  "transform": [
    {
      // Aggregate Transform
      "aggregate": [{"op": ..., "field": ..., "as": ...}],
      "groupby": [...]
    }
     ...
  ],
  ...
}

For example, here is the same bar chart which aggregates mean of Acceleration, grouped by the number of Cylinders, but this time using the aggregate property as part of the transform.

An aggregate transform in the transform array has the following properties:

Property Type Description
aggregate AggregatedFieldDef[]

Required. Array of objects that define fields to aggregate.

groupby String[]

The data fields to group by. If not specified, a single group containing all data objects will be used.

Aggregated Field Definition for Aggregate Transform

Property Type Description
op String

Required. The aggregation operations to apply to the fields, such as sum, average or count. See the full list of supported aggregation operations for more information.

field String

The data field for which to compute aggregate function. This is required for all aggregation operations except "count".

as String

Required. The output field names to use for each aggregated field.

Note: It is important you parse your data types explicitly, especially if you are likely to have null values in your dataset and automatic type inference will fail.

Supported Aggregation Operations

The supported aggregation operations are:

Operation Description
count The total count of data objects in the group. Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field. Similar to SQL’s count(*), count can be specified with a field "*".
valid The count of field values that are not null, undefined or NaN.
missing The count of null or undefined field values.
distinct The count of distinct field values.
sum The sum of field values.
mean The mean (average) field value.
average The mean (average) field value. Identical to mean.
variance The sample variance of field values.
variancep The population variance of field values.
stdev The sample standard deviation of field values.
stdevp The population standard deviation of field values.
stderr The standard error of field values.
median The median field value.
q1 The lower quartile boundary of field values.
q3 The upper quartile boundary of field values.
ci0 The lower boundary of the bootstrapped 95% confidence interval of the mean field value.
ci1 The upper boundary of the bootstrapped 95% confidence interval of the mean field value.
min The minimum field value.
max The maximum field value.
argmin An input data object containing the minimum field value.
argmax An input data object containing the maximum field value.

Note: When accessing aggregated argmax/argmin fields, the aggregated fields must be flattened, due to the nested field issue. The aggregated fields can be flattened with the calculate transform as done in the CO2 example.