Box Plot

Edit this page
// Single View Specification
{
  "data": ... ,
  "mark": "boxplot",
  "encoding": ... ,
  ...
}

A box plot summarizes a distribution of quantitative values using a set of summary statistics. The median tick in the box represents the median. The lower and upper parts of the box represent the first and third quartile respectively. Depending on the type of box plot, the ends of the whiskers can represent multiple things.

To create a box plot, set mark to "boxplot".

Documentation Overview

Box Plot Mark Properties

A boxplot’s mark definition contain the following properties:

Property Type Description
type BoxPlot

Required. The mark type. This could a primitive mark type (one of "bar", "circle", "square", "tick", "line", "area", "point", "geoshape", "rule", and "text") or a composite mark type ("boxplot", "errorband", "errorbar").

extent String | Number

The extent of the whiskers. Available options include:

  • "min-max": min and max are the lower and upper whiskers respectively.
  • A number representing multiple of the interquartile range. This number will be multiplied by the IQR to determine whisker boundary, which spans from the smallest data to the largest data within the range [Q1 - k * IQR, Q3 + k * IQR] where Q1 and Q3 are the first and third quartiles while IQR is the interquartile range (Q3-Q1).

Default value: 1.5.

orient String

Orientation of the box plot. This is normally automatically determined based on types of fields on x and y channels. However, an explicit orient be specified when the orientation is ambiguous.

Default value: "vertical".

size Number

Size of the box and median tick of a box plot

color Color | Gradient | ExprRef

Default color.

Default value: "#4682b4"

Note:

  • This property cannot be used in a style config.
  • The fill and stroke properties have higher precedence than color and will override color.
opacity Number

The opacity (value between [0,1]) of the mark.

invalid String | Null

Invalid data mode, which defines how the marks and corresponding scales should represent invalid values (null and NaN in continuous scales without defined output for invalid values).

  • "filter"Exclude all invalid values from the visualization’s marks and scales. For path marks (for line, area, trail), this option will create paths that connect valid points, as if the data rows with invalid values do not exist.

  • "break-paths-filter-domains" — Break path marks (for line, area, trail) at invalid values. For non-path marks, this is equivalent to "filter". All scale domains will exclude these filtered data points.

  • "break-paths-show-domains" — Break paths (for line, area, trail) at invalid values. Hide invalid values for non-path marks. All scale domains will include these filtered data points (for both path and non-path marks).

  • "show" or null — Show all data points in the marks and scale domains. Each scale will use the output for invalid values defined in config.scale.invalid or, if unspecified, by default invalid values will produce the same visual values as zero (if the scale includes zero) or the minimum value (if the scale does not include zero).

  • "break-paths-show-path-domains" (default) — This is equivalent to "break-paths-show-domains" for path-based marks (line/area/trail) and "filter" for non-path marks.

Note: If any channel’s scale has an output for invalid values defined in config.scale.invalid, all values for the scales will be considered “valid” since they can produce a reasonable output for the scales. Thus, fields for such channels will not be filtered and will not cause path breaks.

Besides the properties listed above, "box", "median", "rule", "outliers", and "ticks" can be used to specify the underlying mark properties for different parts of the box plots as well.

Types of Box Plot

Vega-Lite supports two types of box plots, defined by the extent property in the mark definition object.

  1. Tukey Box Plot is the default box plot in Vega-Lite. For a Tukey box plot, the whisker spans from the smallest data to the largest data within the range [Q1 - k * IQR, Q3 + k * IQR] where Q1 and Q3 are the first and third quartiles while IQR is the interquartile range (Q3-Q1). In this type of box plot, you can specify the constant k by setting the extent. If there are outlier points beyond the whisker, they will be displayed using point marks.

By default, the extent is 1.5.

Explicitly setting extent to 1.5 produces the following identical plot.

  1. min-max Box Plot is a box plot where the lower and upper whiskers are defined as the min and max respectively. No points will be considered as outliers for this type of box plots.

Dimension & Orientation

Vega-Lite supports both 1D and 2D box plots:

1D box plot shows the distribution of a continuous field.

A boxplot’s orientation is automatically determined by the continuous field axis. For example, you can create a vertical 1D box plot by encoding a continuous field on the y axis.

2D box plot shows the distribution of a continuous field, broken down by categories.

For 2D box plots with one continuous field and one discrete field, the box plot will be horizontal if the continuous field is on the x axis.

Alternatively, if the continuous field is on the y axis, the box plot will be vertical.

The Parts of Box Plots

Under the hood, the "boxplot" mark is a composite mark that expands into a layered plot. For example, a basic 1D boxplot shown above is expanded to:

To customize different parts of the box, we can customize different parts of the box plot mark definition or config.

For example, we can customize the box plot’s "median" tick by setting "color" to "red" and set "ticks" to true to make the box plot includes end ticks:

Color, Size, and Opacity Encoding Channels

You can customize the color, size, and opacity of the box in the boxplot by using the color, size, and opacity encoding channels. The size is applied to only the box and median tick. The color is applied to only the box and the outlier points. Meanwhile, the opacity is applied to the whole boxplot.

An example of a boxplot where the size encoding channel is specified.

Tooltip Encoding Channels

You can add custom tooltips to box plots. The custom tooltip will override the default boxplot’s tooltips.

If the field in the tooltip encoding is unaggregated, it replaces the tooltips of the outlier marks.

On the other hand, if the field in the tooltip encoding is aggregated, it replaces the tooltips of the box and whisker marks.

Mark Config

{
  "boxplot": {
    "size": ...,
    "extent": ...,
    "box": ...,
    "median": ...,
    "whisker": ...,
    "outliers": ...
  }
}

The boxplot config object sets the default properties for boxplot marks.

The boxplot config can contain all boxplot mark properties but currently not supporting color, opacity, and orient. Please see issue #3934.

Box Plot with Pre-Calculated Summaries

If you have data summaries pre-calculated for a box plot, you can use layer to build a box plot like this: