This website is for Vega-Lite v1. Go to the main Vega-Lite homepage for the latest release.

Transformation

Data Transformation in Vega-Lite are described via either top-level transforms (the transform property) or inline transforms inside encoding (aggregate, bin, timeUnit, and sort).

When both types of transforms are specified, the top-level transforms are executed first in this order: filterInvalid, calculate, and then filter. Then the inline transforms are executed in this order: bin, timeUnit, aggregate, and sort.

The rest of this page describes the top-level transform property. For more information about inline transforms, please see the following pages: bin, timeUnit, aggregate, and sort.

Top-level Transform Property

{
  "data": ... ,
  "transform": {       // transform
    "filterInvalid": ...,
    "calculate": ...,
    "filter": ...
  },
  "mark": ... ,
  "encoding": ... ,
  ...
}

The top-level transform object supports the following transformation properties:

Property Type Description
filterInvalid Boolean Whether to filter invalid values (null and NaN) from the data.
•By default (undefined), only quantitative and temporal fields are filtered.
•If set to true, all data items with null values are filtered.
•If false, all data items are included. In this case, null values will be interpret as zeroes.
calculate Formula[] An array of formula objects for deriving new fields. Each formula object has two properties:
field (String) – The field name in which to store the computed value.
expr (String) – A string containing an expression for the formula. Use the variable datum to refer to the current data object.
filter String | FilterObject | String[] | FilterObject[] A filter object or a Vega Expression string for filtering data items (or rows) or an array of either filter objects or expression strings.

These transforms are executed in this order: filterInvalid, calculate, and then filter. Since calculate is before filter, derived fields can be used in filter’s expression.

Example

This example use calculate to derive a new field, then filter data based on the new field.

Filter

Vega-Lite’s transform.filter property can be (1) a filter predicate object, (2) Vega Expression string or (3) an array of filter predicates (either predicate object or expression string) that must be all true for a datum to be include.

Filter Object

For a filter object, a field must be provided with one of the filter operators (equal, in, range). Values of these operators can be primitive types (string, number, boolean) or a DateTime definition object for describiing time. In addition, timeUnit can be provided to further transform a temporal field.

The following table describes properties of a filter object.

Property Type Description
field String Field to be filtered.
equal String | Number | DateTime | Boolean Value that the field’s value should be equal to.
range Number[] | DateTime[] Array of length 2 describing (inclusive) minimum and maximum values for the field’s value to be included in the filtered data. If the minimum / maximum is null, then the ranged has unbounded minimum / maximum.
oneOf String[] | Number[] | DateTime[] A set of values that the field’s value should be a member of, for a data item included in the filtered data.
Date Time Definition Object

A DateTime object must have at least one of the following properties:

Property Type Description
year Number Integer value representing the year.
quarter Number Integer value representing the quarter of the year (from 1-4).
month Number | string One of: (1) integer value representing the month from 1-12. 1 represents January; (2) case-insensitive month name (e.g., "January"); (3) case-insensitive, 3-character short month name (e.g., "Jan").
date Number Integer value representing the date from 1-31.
day Number | string Value representing the day of week. This can be one of: (1) integer value – 1 represents Monday; (2) (2) case-insensitive day name (e.g., "Monday"); (3) case-insensitive, 3-character short day name (e.g., "Mon").
Warning: A DateTime definition object with day** should not be combined with year, quarter, month, or date.
hours Number Integer value representing the hour of day from 0-23.
minutes Number Integer value representing minute segment of a time from 0-59.
seconds Number Integer value representing second segment of a time from 0-59.
milliseconds Number Integer value representing millsecond segment of a time.

Examples

  • {"field": "car_color", "equal": "red"} checks if the car_color field’s value is equal to "red".
  • {"field": "car_color", "in":["red", "yellow"]} checks if the car_color field’s value is "red" or "yellow".
  • {"field": "x", "range": [0, 5]} checks if the x field’s value is in range [0,5] (0 ≤ x ≤ 5).
  • {"field": "x", "range": [null, 5]} checks if the x field’s value is in range [-Infinity,5] (x ≤ 5).
  • {"timeUnit": "year", "field": "date", "range": [2006, 2008] } checks if the date’s value is between year 2006 and 2008.
  • {"field": "date", "range": [{"year": 2006, "month": "jan", "date": 1}, {"year": 2008, "month": "feb", "date": 20}] } checks if the date’s value is between Jan 1, 2006 and Feb 20, 2008.

Filter Expresssion

For a Vega Expression string, each datum object can be referred using bound variable datum. For example, setting filter to "datum.b2 > 60" would make the output data includes only items that have values in the field b2 over 60.

Filter Array

For a filter array, the array’s members should be either filter objects or filter expresssions. All of member predicates should be satisfied for a data item to be included in the filtered data. In other words, the filter array will form a conjunctive predicate that join all predicates with “and” operators.