Mapping Airport Connections Tutorial

From massive hubs to small local outposts, the United States air traffic system consists of a rich set of connections among hundeds of airports. In this tutorial, we will visualize this system of airports and connections, based on data from the year 2008. Our goal will be to create our own version of Mike Bostock’s D3.js airports map.

In the process, we will touch upon many of the features supported by Vega: transforming data, combining multiple data sets, handling user input, mapping geographic data, and plotting both tabular and network data. Here’s the steps we will follow:

  1. Meet the Data
  2. Visualization Scaffolding
  3. Draw a U.S. State Map
  4. Plot All the Airports
  5. Filter the Airports by Traffic
  6. Size the Airports by Traffic
  7. Show Airport Name on Mouse Hover
  8. Sort the Airports by Size
  9. Plot the Connections among Airports
  10. Show Connections on Mouse Hover
  11. Add Mouse Acceleration via Voronoi Cells
  12. One Last Thing…

As you work through the tutorial, we encourage you to build up and experiment with each step in the online Vega Editor.

Step 1: Meet the Data

Let’s begin by examining our raw ingredients. We will build the visualization from three data sets; each resides in the data subfolder and are available within the Vega Editor.

us-10m.json

This is a TopoJSON file that contains the boundaries of U.S. states at a scale of 1:10,000,000. We will use this file to draw a map of U.S. states as the base layer of our visualization. If you are interested in generating your own TopoJSON data, check out the U.S. Atlas TopoJSON tools.

airports.csv

This data set contains a list of U.S. airports in CSV (comma-separated value) format. The first column (iata) contains a unique identifier provided by the International Air Transport Association. Subsequent columns describe the name and location of the airport, including latitude and longitude coordinates. We will use these coordinates to plot the airports.

iata,name,city,state,country,latitude,longitude
00M,Thigpen,Bay Springs,MS,USA,31.95376472,-89.23450472
00R,Livingston Municipal,Livingston,TX,USA,30.68586111,-95.01792778
00V,Meadow Lake,Colorado Springs,CO,USA,38.94574889,-104.5698933
01G,Perry-Warsaw,Perry,NY,USA,42.74134667,-78.05208056

flights-airport.csv

This data set contains flight information for the year 2008. Each record consists of an origin airport (identified by IATA id), a destination airport, and the count of flights along this route. We will use this dataset to compute per-airport traffic and to plot connections among airports.

origin,destination,count
ABE,ATL,853
ABE,BHM,1
ABE,CLE,805
ABE,CLT,465

Step 2: Visualization Scaffolding

Now let’s get started building our visualization! We begin with a basic scaffold:

{
  "$schema": "https://vega.github.io/schema/vega/v3.0.json",
  "width": 900,
  "height": 560,
  "padding": {"top": 0, "left": 0, "right": 0, "bottom": 0},
  "signals": [],
  "data": [],
  "scales": [],
  "projections": [],
  "marks": []
}

We first set the width (900 pixels) and height (560 pixels) of the view, and initially set the padding to zero (we will adjust this later on). We also include empty arrays for:

  • signals for variables that dynamically react to user input,
  • data to load and transform data sets,
  • scales to map data values to visual variables,
  • projections to draw a base map showing U.S. states, and
  • marks for graphical elements that visualize data.

We will fill in each of these aspects as we progress.

Step 3: Draw a U.S. State Map

Now that we have a basic setup, let’s add a U.S. state map as the background layer of our visualization. This requires two steps: preparing the data and drawing the map.

Load TopoJSON Data

To load the geographic data, we add an entry to the data array:

"data": [
  {
    "name": "states",
    "url": "data/us-10m.json",
    "format": {"type": "topojson", "feature": "states"}
  }
],

This entry defines a new dataset named states, and loads it from the provided URL. We use the format property to indicate that this is topojson data and that we wish to use the feature named states. TopoJSON files may include any number of arbitrarily-named features; us-10m.json includes the features states and counties.

We can now load the data, and Vega unpacks the requested feature into a collection of latitude and longitude coordinates in GeoJSON format. To visualize the states, we next want to add a cartographic projection to map from (longitude, latitude) coordinates to (x, y) coordinates. While a number of projections might be reasonable choices, the Albers projection preserves area, and the special albersUsa version places Alaska and Hawaii in convenient viewing locations, which will allow us to better see flights originating in those states.

"projections": [
  {
    "name": "projection",
    "type": "albersUsa",
    "scale": 1200,
    "translate": [{"signal": "width / 2"}, {"signal": "height / 2"}]
  }
],

In addition to the projection type, we supply scale and translate parameters to set the zoom level and relative position of the map. We use signal expressions to ensure that we translate the map to the center of the view. With our projection defined, we can update our data set definition to generate paths for each state using the geopath transform:

"data": [
  {
    "name": "states",
    "url": "data/us-10m.json",
    "format": {"type": "topojson", "feature": "states"},
    "transform": [
      {
        "type": "geopath",
        "projection": "projection"
      }
    ]
  }
],

For each state, the geopath transform sets a path property that contains outlines for each state as SVG path strings.

Draw State Boundaries

Once we’ve generated outline paths for each state, plotting them is straightforward:

"marks": [
  {
    "type": "path",
    "from": {"data": "states"},
    "encode": {
      "enter": {
        "fill": {"value": "#dedede"},
        "stroke": {"value": "white"}
      },
      "update": {
        "path": {"field": "path"}
      }
    }
  }
]

Here we add a definition for a path mark to the marks array. In the enter block we set the fill color to light grey, and set the stroke color to white. In the update block we use the path property for the path outline. (We place the path encoding in the update block to ensure that the map outlines update if we change the size of the visualization or modify other aspects.)

Et voilà! Our visualization now contains a base map. If you are feeling adventurous, try experimenting with changing the map projection, adjusting the projection parameters, and modifying the visualization size!

Step 4: Plot All the Airports

Now we can plot the airports on our map. We load the airport data by adding a new airports entry to the end of the data array:

{
  "name": "airports",
  "url": "data/airports.csv",
  "format": {"type": "csv", "parse": "auto"}
}

We use the format property to indicate that this is csv data and that we should auto parse the data values. Vega will automatically try to determine which columns are numbers, which are strings, and so on.

To plot the data, we need to project the longitude and latitude variables to x and y coordinates. Here we use the geopoint transform with the same projection as before. By default, geopoint writes the projected coordinates to the x and y fields of the data objects (though you can use the as parameter to provide different field names).

{
  "name": "airports",
  "url": "data/airports.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    {
      "type": "geopoint",
      "projection": "projection",
      "fields": ["longitude", "latitude"]
    },
    {
      "type": "filter",
      "expr": "datum.x != null && datum.y != null"
    }
  ]
}

In addition, we add a filter transform to remove any data points with null coordinates. Airports outside the 50 states and Washington D.C. (such as those in U.S. territories) are not supported by the albersUsa projection. The projection returns null x and y values for those airports.

To visualize the airports, we add a new symbol mark entry to the marks array:

{
  "type": "symbol",
  "from": {"data": "airports"},
  "encode": {
    "enter": {
      "size": {"value": 16},
      "fill": {"value": "steelblue"},
      "fillOpacity": {"value": 0.8},
      "stroke": {"value": "white"},
      "strokeWidth": {"value": 1.5}
    },
    "update": {
      "x": {"field": "x"},
      "y": {"field": "y"}
    }
  }
}

The symbol mark type defaults to circles if no shape property is provided. We position each airport according to the x and y coordinates set by the geopoint transform. We also set a number of constant values for the size, fill and stroke.

The result is a map of all airports in the United States. That’s a lot of airports!

Step 5: Filter the Airports by Traffic

We’d now like to visualize more information about the airports. How much traffic does each airport get in 2008? In the process, we can also filter out those airports that did not service any commercial flights in 2008. To incorporate this information, we need to load the flights data.

We can add the following entry to the the data array, before the airports data:

{
  "name": "traffic",
  "url": "data/flights-airport.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    {
      "type": "aggregate",
      "groupby": ["origin"],
      "fields": ["count"], "ops": ["sum"], "as": ["flights"]
    }
  ]
}

We load the data and parse the CSV file. We then use an aggregate transform to count the total number of flights originating at each airport. The resulting traffic dataset is a table with two variables: an origin airport and the total count of flights that departed that airport.

Now that we have a measure of per-airport traffic, we would like to combine (or in database terms, join) this data with our original airports data. To do this, we use Vega’s lookup transform. This transform looks for specified key values of a primary dataset within a secondary dataset, and if found adds the matched data record as a new property of the primary data.

{
  "name": "airports",
  "url": "data/airports.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    {
      "type": "lookup",
      "from": "traffic", "key": "origin",
      "fields": ["iata"], "as": ["traffic"]
    },
    {
      "type": "filter",
      "expr": "datum.traffic != null"
    },
    {
      "type": "geopoint",
      "projection": "projection",
      "fields": ["longitude", "latitude"]
    },
    {
      "type": "filter",
      "expr": "datum.x != null && datum.y != null"
    }
  ]
}

Here the airports data is our primary dataset and the traffic data is used as the lookup table. For each airport, we look for a record in the traffic data whose origin property matches the iata property of the airport. If a match is found, we add the traffic record to the airport data under a property named traffic.

We also add a new filter transform to remove airports for which we fail to find a match in the traffic dataset (indicated by a null value). This step filters out all the airports for which we observe no originating flights in 2008. Though we could combine the filter criteria into a single filter instance, here we filter out the extraneous airports up front so that we don’t waste time needlessly computing geo-coordinates.

We now see only those airports that are included in the flights-airport.csv data.

Step 6: Size the Airports by Traffic

We can also use the traffic data to size each airport based on the number of originating flights. We first add an entry to the scales array to map from traffic to circular area:

"scales": [
  {
    "name": "size",
    "type": "linear",
    "domain": {"data": "traffic", "field": "flights"},
    "range": [16, 1000]
  }
],

Here we define a linear scale that maps from the domain of flight counts to a range of [16, 1000] pixels. At this point, you might be thinking: Why a linear scale? If we map that to the radius of a circle, won’t we exaggerate the area and mislead our viewers? (In general you would be correct!) However, in this case we are using the size parameter of our symbol marks: this property sets the area, not the radius, of the symbol.

{
  "type": "symbol",
  "from": {"data": "airports"},
  "encode": {
    "enter": {
      "size": {"scale": "size", "field": "traffic.flights"},
      "fill": {"value": "steelblue"},
      "fillOpacity": {"value": 0.8},
      "stroke": {"value": "white"},
      "strokeWidth": {"value": 1.5}
    },
    "update": {
      "x": {"field": "x"},
      "y": {"field": "y"}
    }
  }
}

Here we modify our airport symbol marks. The size property is now set by running traffic.flights through our size scale. We also set fillOpacity to create a transparency effect and increase the strokeWidth.

We can now see the massive hubs, the local outposts, and everything in-between!

Step 7: Show Airport Name on Mouse Hover

While we can see differences between airports, it would be nice to see their names, too. We will now add interactions to show the airport name and IATA id upon mouse hover. To do so, we add two entries to the signals array:

"signals": [
  {
    "name": "hover",
    "value": null,
    "on": [
      {"events": "symbol:mouseover", "update": "datum"},
      {"events": "symbol:mouseout", "update": "null"}
    ]
  },
  {
    "name": "title",
    "value": "U.S. Airports, 2008",
    "update": "hover ? hover.name + ' (' + hover.iata + ')' : 'U.S. Airports, 2008'"
  }
],

Signals are variables that can change dynamically in response to user input. Each signal consists of a name, an initial value, and an optional set of one or more update rules. The update rules are defined as a set of handlers for input events that update the signal value. To learn more about expressions, see the expression language documentation.

The hover signal contains the current airport record (or datum) under the mouse cursor, or null if no airport is being hovered over. The on entries state that upon mousover of a symbol mark, set the signal value to the current datum. Upon a mouseout event, set the signal value to null. In short, this signal tracks the data of the currently hovered symbol.

The title signal contains a string describing the currently hovered airport, or a generic title string if no airport is selected. The update entry responds to changes in the hover signal, producing a string containing the airport name and iata id if the hover data point is defined.

To show this information, we add a new text mark entry to the end of the marks array:

 {
  "type": "text",
  "interactive": false,
  "encode": {
    "enter": {
      "x": {"signal": "width", "offset": -5},
      "y": {"value": 0},
      "fill": {"value": "black"},
      "fontSize": {"value": 20},
      "align": {"value": "right"}
    },
    "update": {
      "text": {"signal": "title"}
    }
  }
}

Note that this text mark definition does not have a from property indicating a backing data source. If the from property is omitted, Vega defaults to generating a single mark instance. We set the text layout and appearance. We use a combination of a signal expression and offset to position the title 5 pixels in from the right side of the view. We set the text content to be the title signal. As the title signal changes, the text will reactively update in response.

Unfortunately, there is still a problem: we positioned the text at the top-right of the display, such that much of the text will actually be drawn off-screen!

"padding": {"top": 25, "left": 0, "right": 0, "bottom": 0},

To fix the problem, we can add 25 pixels of padding to the top of the visualization.

Now we can mouse over each airport to see more information.

Step 8: Sort the Airports by Size

Notice any problems with our visualization? Take a look at Chicago: O’Hare completely covers Midway! Or look at Dallas, where DFW clobbers DAL. When visualizing points of varying size, it is common to sort by size, such that smaller elements are drawn on top of larger elements.

{
  "name": "airports",
  "url": "data/airports.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    {
      "type": "lookup",
      "from": "traffic", "key": "origin",
      "fields": ["iata"], "as": ["traffic"]
    },
    {
      "type": "filter",
      "expr": "datum.traffic != null"
    },
    {
      "type": "geopoint",
      "projection": "projection",
      "fields": ["longitude", "latitude"]
    },
    {
      "type": "filter",
      "expr": "datum.x != null && datum.y != null"
    },
    {
      "type": "collect", "sort": {
        "field": "traffic.flights",
        "order": "descending"
      }
    }
  ]
}

We now add a collect transform to the airports data: this operation collects all the object in a data stream, allowing us to sort them. We include a sort parameter that indicates we should sort by the flight counts data field in a descending sort order. This sorting causes larger points to be drawn before smaller points.

That’s better!

Step 9: Plot the Connections among Airports

Now we’re ready to visualize the connections among the airports.

Let’s add a new entry (routes) to the end of the data array:

{
  "name": "routes",
  "url": "data/flights-airport.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    {
      "type": "lookup",
      "from": "airports", "key": "iata",
      "fields": ["origin", "destination"], "as": ["source", "target"]
    },
    {
      "type": "filter",
      "expr": "datum.source && datum.target"
    },
    {
      "type": "linkpath",
      "shape": "line"
    }
  ]
}

After loading the CSV file, we apply a series of transforms. First, we need the coordinates for both the origin and destination airports so we perform a lookup against the airports data. The transform is similar to our earlier lookup against the traffic data, except here we perform two lookups at once to grab both the origin and destination. Second, we filter out any routes for which the lookup failed. This keeps us safe if the routes data contains any origins or destinations not present in the airports data.

Finally, we add a linkpath transform, which generates an SVG path based on link end points. The linkpath shape defaults to a straight line, but we explicitly set the shape parameter to line to make our intention clear. (For a different style, try setting the shape to curve instead!)

To determine the link end points, the linkpath transform looks for source.x and source.y properties (and similarly for target) by default. In this case, we don’t need to provide any additional configuration – that is why we set source and target as the output properties of the lookup transform!

Now we can add a new path mark to the marks array to visualize the routes:

{
  "type": "path",
  "interactive": false,
  "from": {"data": "routes"},
  "encode": {
    "enter": {
      "path": {"field": "layout_path"},
      "stroke": {"value": "black"},
      "strokeOpacity": {"value": 0.15}
    }
  }
}

We set the interactive property false to prevent the links from interfering with mouse events from the airport symbols. As there may be many links with lots of overlap, we also set the strokeOpacity to a low value (0.15).

What a mess! Let’s filter the connections and use interaction to show details-on-demand.

Step 10: Show Connections on Mouse Hover

Adding hover-sensitive filtering is now quite easy:

{
  "name": "routes",
  "url": "data/flights-airport.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    {
      "type": "filter",
      "expr": "hover && hover.iata == datum.origin"
    },
    ...
  ]
}

We already have a hover signal set up to track the data associated with the currently selected airport. We just need to add a filter transform to our routes data: the new filter at the beginning of the transform list keeps only those routes whose origin matches the selected airport. In addition, we increase the path strokeOpacity to 0.35 to be a bit more opaque.

Now we can interactively explore the network of routes!

Step 11: Add Mouse Acceleration via Voronoi Cells

We now have a nice interactive visualization, but we can make it even better. Some of the airports are rather small and hard to select. Instead of having to hover directly over a point, let’s update the visualization to select the nearest airport to the mouse cursor.

To do so, we can create a Voronoi diagram, which subdivides space into the cells containing all points closest to our data points. We can then filter the routes based on mouse over of the Voronoi cells. To compute the Voronoi cells, we add a voronoi transform to the airports data before the sort operation:

{
  "name": "airports",
  "url": "data/airports.csv",
  "format": {"type": "csv", "parse": "auto"},
  "transform": [
    ...
    {
      "type": "voronoi", "x": "x", "y": "y"
    },
    {
      "type": "collect", "sort": {
        "field": "traffic.flights",
        "order": "descending"
      }
    }
  ]
},

The voronoi transform computes the enclosing cells for each airport using the x and y coordinates. The output is an SVG path string written to the path property.

Next, we add the Voronoi cells to the visualization as an invisible set of path marks. We add the following to the data array, right after the airport symbol marks:

{
  "type": "path",
  "name": "cell",
  "from": {"data": "airports"},
  "encode": {
    "enter": {
      "fill": {"value": "transparent"}
    },
    "update": {
      "path": {"field": "path"}
    }
  }
},

We set the fill to transparent to ensure that the Voronoi cells can’t be seen, but still receive input events. We also specify a name property so that we can refer to these marks elsewhere.

Finally, we update our hover signal to respond to mouse events on the Voronoi cells instead of on the circle symbol marks:

{
  "name": "hover",
  "value": null,
  "on": [
    {"events": "@cell:mouseover", "update": "datum"},
    {"events": "@cell:mouseout", "update": "null"}
  ]
},

Here we simply replace symbol: with @cell:. The @name pattern selects only events originating from a mark with the provided name.

Now we have much improved, user-friendly mouse selection!

Step 12: One Last Thing…

We now have a complete interactive visualization! But before we wrap up, let’s add a little easter egg. We use invisible Voronoi cells to help with mouse selections. In order to better understand them, it might be nice to actually see them! Let’s toggle their visibility with a double click…

We add a new signal named cell_stroke to our signals array:

{
  "name": "cell_stroke",
  "value": null,
  "on": [
    {"events": "dblclick", "update": "cell_stroke ? null : 'brown'"},
    {"events": "mousedown!", "update": "cell_stroke"}
  ]
}

This signal responds to any double click (dblclick) event, toggling the cell_stroke signal between null and 'brown'. However, the double click action may cause text on the page to be automatically selected due to the web browser’s default behavior. To prevent text selection on double click, we include an extra event handler for mousedown events. The exclamation point (mousedown!) indicates that the handler should consume the event, thereby preventing side-effects such as text selection; the signal itself simply retains its current value.

Finally, we update our Voronoi cell marks:

{
  "type": "path",
  "name": "cell",
  "from": {"data": "airports"},
  "encode": {
    "enter": {
      "fill": {"value": "transparent"},
      "strokeWidth": {"value": 0.35}
    },
    "update": {
      "path": {"field": "path"},
      "stroke": {"signal": "cell_stroke"}
    }
  }
},

We add strokeWidth to the enter properties, and add stroke color to the update properties. We set stroke to be the value of the cell_stroke signal, so that the Voronoi cells will automatically toggle their visibility upon double click.

To test out our easter egg, scroll back to the top of this page and try double clicking the visualization! You may also now want to review our complete Vega specification.

Next Steps

Though we’ve reached the end of this tutorial, there are a number of additional variations you might attempt on your own:

  • Can you modify the signals to support touch events as well as mouse events?
  • In addition to showing links on hover, can you make the origin airport highlight in a new color?
  • When selecting an origin airport, can you make the destinations highlight?
  • Can you change the map projection to view the routes from a different perspective?
  • Can you collect additional data and visualize airports across the entire world?