Box Plot
Edit this page// Single View Specification
{
"data": ... ,
"mark": "boxplot",
"encoding": ... ,
...
}
A box plot summarizes a distribution of quantitative values using a set of summary statistics. The median tick in the box represents the median. The lower and upper parts of the box represent the first and third quartile respectively. Depending on the type of box plot, the ends of the whiskers can represent multiple things.
To create a box plot, set mark
to "boxplot"
.
Documentation Overview
- Box Plot Mark Properties
- Types of Box Plot
- Dimension & Orientation
- The Parts of Box Plots
- Color, Size, and Opacity Encoding Channels
- Tooltip Encoding Channels
- Mark Config
- Box Plot with Pre-Calculated Summaries
Box Plot Mark Properties
A boxplot’s mark definition contain the following properties:
Property | Type | Description |
---|---|---|
type | BoxPlot |
Required. The mark type. This could a primitive mark type (one of |
extent | String | Number |
The extent of the whiskers. Available options include:
Default value: |
orient | String |
Orientation of the box plot. This is normally automatically determined based on types of fields on x and y channels. However, an explicit Default value: |
size | Number |
Size of the box and median tick of a box plot |
color | Color | Gradient | ExprRef |
Default color. Default value: ■ Note:
|
opacity | Number |
The opacity (value between [0,1]) of the mark. |
invalid | String | Null |
Invalid data mode, which defines how the marks and corresponding scales should represent invalid values (
Note: If any channel’s scale has an output for invalid values defined in |
Besides the properties listed above, "box"
, "median"
, "rule"
, "outliers"
, and "ticks"
can be used to specify the underlying mark properties for different parts of the box plots as well.
Types of Box Plot
Vega-Lite supports two types of box plots, defined by the extent
property in the mark definition object.
- Tukey Box Plot is the default box plot in Vega-Lite. For a Tukey box plot, the whisker spans from the smallest data to the largest data within the range [Q1 - k * IQR, Q3 + k * IQR] where Q1 and Q3 are the first and third quartiles while IQR is the interquartile range (Q3-Q1). In this type of box plot, you can specify the constant k by setting the
extent
. If there are outlier points beyond the whisker, they will be displayed using point marks.
By default, the extent is 1.5
.
Explicitly setting extent
to 1.5
produces the following identical plot.
min-max
Box Plot is a box plot where the lower and upper whiskers are defined as the min and max respectively. No points will be considered as outliers for this type of box plots.
Dimension & Orientation
Vega-Lite supports both 1D and 2D box plots:
1D box plot shows the distribution of a continuous field.
A boxplot’s orientation is automatically determined by the continuous field axis. For example, you can create a vertical 1D box plot by encoding a continuous field on the y axis.
2D box plot shows the distribution of a continuous field, broken down by categories.
For 2D box plots with one continuous field and one discrete field, the box plot will be horizontal if the continuous field is on the x axis.
Alternatively, if the continuous field is on the y axis, the box plot will be vertical.
The Parts of Box Plots
Under the hood, the "boxplot"
mark is a composite mark that expands into a layered plot. For example, a basic 1D boxplot shown above is expanded to:
To customize different parts of the box, we can customize different parts of the box plot mark definition or config.
For example, we can customize the box plot’s "median"
tick by setting "color"
to "red"
and set "ticks"
to true to make the box plot includes end ticks:
Color, Size, and Opacity Encoding Channels
You can customize the color, size, and opacity of the box in the boxplot
by using the color
, size
, and opacity
encoding channels. The size
is applied to only the box and median tick. The color
is applied to only the box and the outlier points. Meanwhile, the opacity
is applied to the whole boxplot
.
An example of a boxplot
where the size
encoding channel is specified.
Tooltip Encoding Channels
You can add custom tooltips to box plots. The custom tooltip will override the default boxplot’s tooltips.
If the field in the tooltip encoding is unaggregated, it replaces the tooltips of the outlier marks.
On the other hand, if the field in the tooltip encoding is aggregated, it replaces the tooltips of the box and whisker marks.
Mark Config
{
"boxplot": {
"size": ...,
"extent": ...,
"box": ...,
"median": ...,
"whisker": ...,
"outliers": ...
}
}
The boxplot
config object sets the default properties for boxplot
marks.
The boxplot config can contain all boxplot mark properties but currently not supporting color
, opacity
, and orient
. Please see issue #3934.
Box Plot with Pre-Calculated Summaries
If you have data summaries pre-calculated for a box plot, you can use layer
to build a box plot like this: