Data

Data set definitions and transforms define the data to load and how to process it.

The basic data model used by Vega is tabular data, similar to a spreadsheet or database table. Individual data sets are assumed to contain a collection of records (or “rows”), which may contain any number of named data attributes (fields, or “columns”). Records are modeled using standard JavaScript objects.

If the input data is simply an array of primitive values, Vega maps each value to the data property of a new object. For example [5, 3, 8, 1] is loaded as:

[ {"data": 5}, {"data": 3}, {"data": 8}, {"data": 1} ]

Upon ingest, Vega also assigns each data object a unique id property, accessible via a custom Symbol. As a result, the id property is not accessible via a string key and is not enumerable, though you can observe the id value when inspecting data objects in a JavaScript console.

Data sets can be specified directly by defining data inline or providing a URL from which to load the data. Alternatively, data can be bound dynamically at runtime by using the View API to provide data when a chart is instantiated or issue streaming updates. Loading data from a URL will be subject to the policies of your runtime environment (e.g., cross-origin request rules).

Documentation Overview

Data Properties

Properties for specifying a data set. At most one of the source, url, or values properties should be defined.

Property Type Description
name String Required. A unique name for the data set.
format Format An object that specifies the format for parsing the data file or values. See the format reference for more.
source String | String[ ] The name of one or more data sets to use as the source for this data set. The source property is useful in combination with a transform pipeline to derive new data. If string-valued, indicates the name of the source data set. If array-valued, specifies a collection of data source names that should be merged (unioned) together.
url String A URL from which to load the data set. Use the format property to ensure the loaded data is correctly parsed. If the format property is not specified, the data is assumed to be in a row-oriented JSON format.
values Any The full data set, included inline. The values property allows data to be included directly within the specification itself. While most commonly an array of objects, other data types (such as CSV strings) may be used, subject to the format settings.
async Boolean ≥ 5.9 A boolean flag (default false) indicating if dynamic data loading or reformatting should occur asynchronously. If true, dataflow evaluation will complete, data loading will occur in the background, and the dataflow will be re-evaluated when loading is complete. If false, dataflow evaluation will block until loading is complete and then continue within the same evaluation cycle. The use of async can allow multiple dynamic datasets to be loaded simultaneously while still supporting interactivity. However, the use of async can cause datasets to remain empty while the rest of the dataflow is evaluated, potentially affecting downstream computation.
on Trigger[ ] An array of updates to insert, remove, & toggle data values, or clear the data when trigger conditions are met. See the trigger reference for more.
transform Transform[ ] An array of transforms to perform on the input data. The output of the transform pipeline then becomes the value of this data set. See the transform reference for more.

Dynamic Data Loading

≥ 4.2 For Vega version 4.2 and higher, the data url parameter and (when used with URL-loading) the format parameter may include signal references. This feature allows either the source url or one or more formatting parameters to be dynamically changed at runtime, causing the data to be reloaded. For example, a single spec might load a different dataset based on user input, or the data might be polled at a regular interval in conjunction with a timer event stream.

If no signals are used (the traditional configuration), external data sources are loaded immediately upon view construction and the first dataflow evaluation is delayed until data loading is complete. For dynamic loading, the dataflow must first be evaluated in order to determine the signal values, and then data can be loaded. As a result of this, downstream transforms and encodings may initially be evaluated with empty datasets: be sure any signal expressions behave appropriately with empty data, including downstream concerns such as empty scale domains.

Format

The format object describes the data format and additional parsing instructions.

Name Type Description
type String The data format type. The currently supported data formats are json (the default), csv (comma-separated values), tsv (tab-separated values), dsv (delimited text files), and topojson.
parse String | Object If set to auto, perform automatic type inference to determine the desired data types. Alternatively, a parsing directive object can be provided for explicit data types. Each property of the object corresponds to a field name, and the value to the desired data type (one of "boolean", "date", "number" or "string"). For example, "parse": {"modified_on": "date"} parses the modified_on field in each input record as a Date value. Specific date formats can be provided (e.g., {"foo": "date:'%m%d%Y'"}), using the d3-time-format syntax. UTC date format parsing is supported similarly (e.g., {"foo": "utc:'%m%d%Y'"}).

json

Loads a JavaScript Object Notation (JSON) file. Assumes row-oriented data, where each row is an object with named attributes. This is the default file format, and so will be used if no format parameter is provided. If specified, the format parameter should have a type property of "json", and can also accept the following:

Name Type Description
property String The JSON property containing the desired data. This parameter can be used when the loaded JSON file may have surrounding structure or meta-data. For example "property": "values.features" is equivalent to retrieving json.values.features from the loaded JSON object.
copy Boolean A boolean flag (default false) that indicates if input JSON data should be copied prior to use. This setting may be useful when providing as input pre-parsed JSON data (e.g., not loaded from a URL) that should not be modified.

csv

Load a comma-separated values (CSV) file.

Name Type Description
header String[ ] An array of field names to prepend to the data as a header row. A header should only be supplied if the input data does not already include one.

tsv

Load a tab-separated values (TSV) file.

Name Type Description
header String[ ] An array of field names to prepend to the data as a header row. A header should only be supplied if the input data does not already include one.

dsv

Load a delimited text file with a custom delimiter.

Name Type Description
delimiter String Required. The delimiter between records. The delimiter must be a single character (i.e., a single 16-bit code unit); so, ASCII delimiters are fine, but emoji delimiters are not.
header String[ ] An array of field names to prepend to the data as a header row. A header should only be supplied if the input data does not already include one.

topojson

Load a JavaScript Object Notation (JSON) file using the TopoJSON format. The input file must contain valid TopoJSON data. The TopoJSON input is then converted into a GeoJSON format for use within Vega. There are two mutually exclusive properties that can be used to specify the conversion process:

Name Type Description
feature String The name of the TopoJSON object set to convert to a GeoJSON feature collection. For example, in a map of the world, there may be an object set named "countries". Using the feature property, we can extract this set and generate a GeoJSON feature object for each country.
mesh String The name of the TopoJSON object set to convert to a mesh. Similar to the feature option, mesh extracts a named TopoJSON object set. Unlike the feature option, the corresponding geo data is returned as a single, unified mesh instance, not as individual GeoJSON features. Extracting a mesh is useful for more efficiently drawing borders or other geographic elements that you do not need to associate with specific regions such as individual countries, states or counties.
filter String An optional filter to apply to an extracted mesh. If set to "interior", only interior region boundaries are included, filtering out exterior borders. If set to "exterior", only the exterior border is included, filtering out all internal boundaries. If null or unspecified (the default), no filtering is performed. This property applies to mesh extraction only, not feature extraction. ≥ 5.4
property String The JSON property containing the desired data. Similar to type=json, this optional parameter can be used when the loaded TopoJSON data has surrounding structure or meta-data.
copy Boolean A boolean flag (default false) that indicates if input JSON data should be copied prior to use. Similar to type=json, this setting may be useful when providing as input pre-parsed JSON data (e.g., not loaded from a URL) that should not be modified.

Examples

Here is an example defining data directly in a specification:

{"name": "table", "values": [12, 23, 47, 6, 52, 19]}

One can also load data from an external file (in this case, a JSON file):

{"name": "points", "url": "data/points.json"}

Or, one can simply declare the existence of a data set. The data can then be dynamically provided when the visualization is instantiated. See the View API documentation for more.

{"name": "table"}

Finally, one can draw from an existing data set and apply new data transforms. In this case, we create a new data set ("stats") by computing aggregate statistics for groups drawn from the source "table" data set:

{
  "name": "stats",
  "source": "table",
  "transform": [
    {
      "type": "aggregate",
      "groupby": ["x"],
      "ops": ["average", "sum", "min", "max"],
      "fields": ["y", "y", "y", "y"]
    }
  ]
}