Vega Datasets is the centralized hub for over 70 datasets featured in the examples and documentation of Vega, Vega-Lite, Altair and related projects. A dataset catalog conforming to the Data Package Standard v2 provides information on data structure, sourcing, and licensing. Generation scripts document data provenance and transformation, enabling reproducibility and transparency throughout the data preparation process. Each dataset is curated to illustrate essential visualization concepts, statistical methods, or domain-specific applications.
This data lives at https://github.com/vega/vega-datasets and can be accessed via CDN at https://cdn.jsdelivr.net/npm/vega-datasets.
Modifications of existing datasets should be kept to a minimum as other projects (Vega, Vega Editor, Vega-Lite, Polestar, Voyager) use this data in their tests and examples. Contributions of new datasets, documentation, scripts, corrections and bug fixes are encouraged. Please review the contribution guidelines.
[!IMPORTANT]
Dataset Licensing: Each dataset hosted in this repository maintains its original license as documented in the datapackage metadata. While we’ve made efforts to provide accurate licensing information, this metadata should be considered a starting point rather than definitive guidance. Users should verify their intended use complies with original source licensing terms.
Install Vega Datasets via npm:
npm install vega-datasets
You can get the data directly via HTTP served by GitHub or jsDelivr (a fast CDN):
You can find a full listing of available datasets at https://cdn.jsdelivr.net/npm/vega-datasets/data/.
import data from 'vega-datasets';
const cars = await data['cars.json']();
// equivalent to
// const cars = await (await fetch(data['cars.json'].url)).json();
console.log(cars);
Reference a dataset via URL:
{
"data": {
"url": "https://cdn.jsdelivr.net/npm/vega-datasets@latest/data/cars.json"
},
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Miles_per_Gallon", "type": "quantitative"}
}
}
Repository highlights include:
For the complete list and details, see the data directory or review the datapackage.md file.
Each dataset comes with:
Further information is available in datapackage.md (human-readable) and datapackage.json (machine-readable).
Visualizations built with these datasets are showcased in several galleries:
Vega Datasets follows semantic versioning with additional data-specific guidelines:
For development setup:
npm install
For releasing:
npm run release
The repository code is licensed under the BSD-3-Clause License. Note that individual datasets have distinct licensing terms as specified in their metadata.
Appreciation is extended to the numerous organizations and individuals who have generously shared their data for use in this collection.