Data

Data set definitions and transforms define the data to load and how to process it.

The basic data model used by Vega is tabular data, similar to a spreadsheet or database table. Individual data sets are assumed to contain a collection of records (or “rows”), which may contain any number of named data attributes (fields, or “columns”). Records are modeled using standard JavaScript objects.

If the input data is simply an array of primitive values, Vega maps each value to the data property of a new object. For example [5, 3, 8, 1] is loaded as:

[ {"data": 5}, {"data": 3}, {"data": 8}, {"data": 1} ]

Upon ingest, Vega also assigns each data object a unique id property, accessible via a custom Symbol. As a result, the id property is not accessible via a string key and is not enumerable, though you can observe the id value when inspecting data objects in a JavaScript console.

Data sets can be specified directly by defining data inline or providing a URL from which to load the data. Alternatively, data can be bound dynamically at runtime by using the View API to provide data when a chart is instantiated or issue streaming updates. Loading data from a URL will be subject to the policies of your runtime environment (e.g., cross-origin request rules).

Documentation Overview

Data Properties

Properties for specifying a data set. At most one of the source, url, or values properties should be defined.

Property Type Description
name String Required. A unique name for the data set.
format Format An object that specifies the format for parsing the data file or values. See the format reference for more.
source String | String[ ] The name of one or more data sets to use as the source for this data set. The source property is useful in combination with a transform pipeline to derive new data. If string-valued, indicates the name of the source data set. If array-valued, specifies a collection of data source names that should be merged (unioned) together.
url String A URL from which to load the data set. Use the format property to ensure the loaded data is correctly parsed. If the format property is not specified, the data is assumed to be in a row-oriented JSON format.
values Any The full data set, included inline. The values property allows data to be included directly within the specification itself. While most commonly an array of objects, other data types (such as CSV strings) may be used, subject to the format settings.
on Trigger[ ] An array of updates to insert, remove, & toggle data values, or clear the data when trigger conditions are met. See the trigger reference for more.
transform Transform[ ] An array of transforms to perform on the input data. The output of the transform pipeline then becomes the value of this data set. See the transform reference for more.

Dynamic Data Loading

≥ 4.2 For Vega version 4.2 and higher, the data url parameter and (when used with URL-loading) the format parameter may include signal references. This feature allows either the source url or one or more formatting parameters to be dynamically changed at runtime, causing the data to be reloaded. For example, a single spec might load a different dataset based on user input, or the data might be polled at a regular interval in conjunction with a timer event stream.

If no signals are used (the traditional configuration), external data sources are loaded immediately upon view construction and the first dataflow evaluation is delayed until data loading is complete. For dynamic loading, the dataflow must first be evaluated in order to determine the signal values, and then data can be loaded. As a result of this, downstream transforms and encodings may initially be evaluated with empty datasets: be sure any signal expressions behave appropriately with empty data, including downstream concerns such as empty scale domains.

Format

The format object describes the data format and additional parsing instructions.

Name Type Description
type String The data format type. The currently supported data formats are json (the default), csv (comma-separated values), tsv (tab-separated values), dsv (delimited text files), and topojson.
parse String | Object If set to auto (the default), perform automatic type inference to determine the desired data types. Alternatively, a parsing directive object can be provided for explicit data types. Each property of the object corresponds to a field name, and the value to the desired data type (one of "boolean", "date", "number" or "string"). For example, "parse": {"modified_on": "date"} parses the modified_on field in each input record as a Date value. Specific date formats can be provided (e.g., {"foo": "date:'%m%d%Y'"}), using the d3-time-format syntax. UTC date format parsing is supported similarly (e.g., {"foo": "utc:'%m%d%Y'"}).

json

Loads a JavaScript Object Notation (JSON) file. Assumes row-oriented data, where each row is an object with named attributes. This is the default file format, and so will be used if no format parameter is provided. If specified, the format parameter should have a type property of "json", and can also accept the following:

Name Type Description
property String The JSON property containing the desired data. This parameter can be used when the loaded JSON file may have surrounding structure or meta-data. For example "property": "values.features" is equivalent to retrieving json.values.features from the loaded JSON object.
copy Boolean A boolean flag (default false) that indicates if input JSON data should be copied prior to use. This setting may be useful when providing as input pre-parsed JSON data (e.g., not loaded from a URL) that should not be modified.

csv

Load a comma-separated values (CSV) file. This format type does not support any additional properties.

tsv

Load a tab-separated values (TSV) file. This format type does not support any additional properties.

dsv

Load a delimited text file with a custom delimiter.

Name Type Description
delimiter String Required. The delimiter between records. The delimiter must be a single character (i.e., a single 16-bit code unit); so, ASCII delimiters are fine, but emoji delimiters are not.

topojson

Load a JavaScript Object Notation (JSON) file using the TopoJSON format. The input file must contain valid TopoJSON data. The TopoJSON input is then converted into a GeoJSON format for use within Vega. There are two mutually exclusive properties that can be used to specify the conversion process:

Name Type Description
feature String The name of the TopoJSON object set to convert to a GeoJSON feature collection. For example, in a map of the world, there may be an object set named "countries". Using the feature property, we can extract this set and generate a GeoJSON feature object for each country.
mesh String The name of the TopoJSON object set to convert to a mesh. Similar to the feature option, mesh extracts a named TopoJSON object set. Unlike the feature option, the corresponding geo data is returned as a single, unified mesh instance, not as individual GeoJSON features. Extracting a mesh is useful for more efficiently drawing borders or other geographic elements that you do not need to associate with specific regions such as individual countries, states or counties.
property String The JSON property containing the desired data. Similar to type=json, this optional parameter can be used when the loaded TopoJSON data has surrounding structure or meta-data.
copy Boolean A boolean flag (default false) that indicates if input JSON data should be copied prior to use. Similar to type=json, this setting may be useful when providing as input pre-parsed JSON data (e.g., not loaded from a URL) that should not be modified.

Examples

Here is an example defining data directly in a specification:

{"name": "table", "values": [12, 23, 47, 6, 52, 19]}

One can also load data from an external file (in this case, a JSON file):

{"name": "points", "url": "data/points.json"}

Or, one can simply declare the existence of a data set. The data can then be dynamically provided when the visualization is instantiated. See the View API documentation for more.

{"name": "table"}

Finally, one can draw from an existing data set and apply new data transforms. In this case, we create a new data set ("stats") by computing aggregate statistics for groups drawn from the source "table" data set:

{
  "name": "stats",
  "source": "table",
  "transform": [
    {
      "type": "aggregate",
      "groupby": ["x"],
      "ops": ["average", "sum", "min", "max"],
      "fields": ["y", "y", "y", "y"]
    }
  ]
}