Box Plot

Edit this page
// Single View Specification
{
  "data": ... ,
  "mark": "boxplot",
  "encoding": ... ,
  ...
}

A box plot summarizes a distribution of quantitative values using a set of summary statistics. The median tick in the box represents the median. The lower and upper parts of the box represent the first and third quartile respectively. Depending on the type of box plot, the ends of the whiskers can represent multiple things.

To create a box plot, set mark to "boxplot".

Documentation Overview

Box Plot Mark Properties

A boxplot’s mark definition contain the following properties:

Property Type Description
type String

Required. The mark type. This could a primitive mark type (one of "bar", "circle", "square", "tick", "line", "area", "point", "geoshape", "rule", and "text") or a composite mark type ("boxplot", "errorband", "errorbar").

extent String | Number

The extent of the whiskers. Available options include:

  • "min-max": min and max are the lower and upper whiskers respectively.
  • A number representing multiple of the interquartile range (Q3-Q1). This number will be multiplied by the IQR. The product will be added to the third quartile to get the upper whisker and subtracted from the first quartile to get the lower whisker.

Default value: 1.5.

orient String

Orientation of the box plot. This is normally automatically determined based on types of fields on x and y channels. However, an explicit orient be specified when the orientation is ambiguous.

Default value: "vertical".

size Number

Size of the box and median tick of a box plot

color String

Default color. Note that fill and stroke have higher precedence than color and will override color.

Default value: "#4682b4"

Note: This property cannot be used in a style config.

opacity Number

The opacity (value between [0,1]) of the mark.

Besides the properties listed above, "box", "median", "rule", "outliers", and "ticks" can be used to specify the underlying mark properties for different parts of the box plots as well.

Types of Box Plot

Vega-Lite supports two types of box plots, defined by the extent property in the mark definition object.

1) Tukey Box Plot is the default box plot in Vega-Lite. For a tukey box plot, the whisker spans from Q1 - k * IQR to Q3 + k * IQR where Q1 and Q3 are the first and third quartiles while IQR is the interquartile range (Q3-Q1). In this type of box plot, you can specify the constant k by setting the extent. If there are outlier points beyond the whisker, they will be displayed using point marks.

By default, the extent is 1.5.

Explicitly setting extent to 1.5 produces the following identical plot.

2) min-max Box Plot is a box plot where the lower and upper whiskers are defined as the min and max respectively. No points will be considered as outliers for this type of box plots.

Dimension & Orientation

Vega-Lite supports both 1D and 2D box plots:

1D box plot shows the distribution of a continuous field.

A boxplot’s orientation is automatically determined by the continuous field axis. For example, you can create a vertical 1D box plot by encoding a continuous field on the y axis.

2D box plot shows the distribution of a continuous field, broken down by categories.

For 2D box plots with one continuous field and one discrete field, the box plot will be horizontal if the continuous field is on the x axis.

Alternatively, if the continuous field is on the y axis, the box plot will be vertical.

The Parts of Box Plots

Under the hood, the "boxplot" mark is a composite mark that expands into a layered plot. For example, a basic 1D boxplot shown above is expanded to:

To customize different parts of the box, we can customize different parts of the box plot mark definition or config.

For example, we can customize the box plot’s "median" tick by setting "color" to "red" and set "ticks" to true to make the box plot includes end ticks:

Color, Size, and Opacity Encoding Channels

You can customize the color, size, and opacity of the box in the boxplot by using the color, size, and opacity encoding channels. The size is applied to only the box and median tick. The color is applied to only the box, the median tick, and the outlier points. Meanwhile, the opacity is applied to the whole boxplot.

An example of a boxplot where the size encoding channel is specified.

Mark Config

{
  "boxplot": {
    "size": ...,
    "extent": ...,
    "box": ...,
    "median": ...,
    "whisker": ...,
    "outliers": ...
  }
}

The boxplot config object sets the default properties for boxplot marks.

The boxplot config can contain all boxplot mark properties but currently not supporting color, opacity, and orient. Please see issue #3934.