Aggregation
Edit this pageTo aggregate data in VegaLite, users can either use use an aggregate
property of an encoding field definition or an aggregate
transform inside the transform
array.
Documentation Overview
Aggregate in Encoding Field Definition
// A Single View Specification
{
"data": ... ,
"mark": ... ,
"encoding": {
"x": {
"aggregate": ..., // aggregate
"field": ...,
"type": "quantitative",
...
},
"y": ...,
...
},
...
}
The aggregate
property of a field definition can be used to compute aggregate summary statistics (e.g., median, min, max) over groups of data.
If at least one fields in the specified encoding channels contain aggregate
, the resulting visualization will show aggregate data. In this case, all fields without aggregation function specified are treated as groupby fields^{1} in the aggregation process.
For example, the following bar chart aggregates mean of Acceleration
, grouped by the number of Cylinders
.
The detail
channel can be used to specify additional summary and groupby fields without mapping the field(s) to any visual properties. For example, the following plots add Origin
as a group by field.
^{1}The groupby fields are also known as independent/condition variables in statistics and dimensions in Business Intelligence. Similarly, the aggregate fields are known as dependent variables and measures.
Aggregate Transform
// A View Specification
{
...
"transform": [
{
// Aggregate Transform
"aggregate": [{"op": ..., "field": ..., "as": ...}],
"groupby": [...]
}
...
],
...
}
For example, here is the same bar chart which aggregates mean of Acceleration, grouped by the number of Cylinders, but this time using the aggregate
property as part of the transform
.
An aggregate
transform in the transform
array has the following properties:
Property  Type  Description 

aggregate  AggregatedFieldDef[] 
Required. Array of objects that define fields to aggregate. 
groupby  String[] 
The data fields to group by. If not specified, a single group containing all data objects will be used. 
Aggregated Field Definition for Aggregate Transform
Property  Type  Description 

op  AggregateOp 
Required. The aggregation operations to apply to the fields, such as sum, average or count. See the full list of supported aggregation operations for more information. 
field  String 
Required. The data field for which to compute aggregate function. 
as  String 
Required. The output field names to use for each aggregated field. 
Note: It is important you parse
your data types explicitly, especially if you are likely to have null
values in your dataset and automatic type inference will fail.
Supported Aggregation Operations
The supported aggregation operations are:
Operation  Description 

count  The total count of data objects in the group. Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field. Similar to SQL’s count(*) , count can be specified with a field "*" .

valid  The count of field values that are not null , undefined or NaN . 
missing  The count of null or undefined field values. 
distinct  The count of distinct field values. 
sum  The sum of field values. 
mean  The mean (average) field value. 
average  The mean (average) field value. Identical to mean. 
variance  The sample variance of field values. 
variancep  The population variance of field values. 
stdev  The sample standard deviation of field values. 
stdevp  The population standard deviation of field values. 
stderr  The standard error of field values. 
median  The median field value. 
q1  The lower quartile boundary of field values. 
q3  The upper quartile boundary of field values. 
ci0  The lower boundary of the bootstrapped 95% confidence interval of the mean field value. 
ci1  The upper boundary of the bootstrapped 95% confidence interval of the mean field value. 
min  The minimum field value. 
max  The maximum field value. 
argmin  An input data object containing the minimum field value. 
argmax  An input data object containing the maximum field value. 
Note: When accessing aggregated argmax/argmin fields, the aggregated fields must be flattened, due to the nested field issue. The aggregated fields can be flattened with the calculate transform as done in the CO2 example.