Modes for Handling Invalid Data

Edit this page

This page discusses modes in Vega-Lite for handling invalid data (null and NaN in continuous scales).

The main configurations are mark.invalid and config.scale.invalid. In addition, you can use other Vega-Lite features including conditional encodings, layering, or window transform to handle invalid and missing data.

Note: Vega-Lite does not consider null and NaN in categorical scales and text encodings as invalid data:

  • Categorical scales can treat nulls and NaNs as separate categories
  • Similarly, text encodings can directly display nulls and NaNs.

Documentation Overview

Mark Invalid Mode

You can use mark.invalid (or config.mark.invalid) to configure how marks and their corresponding scales handle invalid data (null and NaN in continuous scales).

Property Type Description
invalid String | Null

Invalid data mode, which defines how the marks and corresponding scales should represent invalid values (null and NaN in continuous scales without defined output for invalid values).

  • "filter"Exclude all invalid values from the visualization’s marks and scales. For path marks (for line, area, trail), this option will create paths that connect valid points, as if the data rows with invalid values do not exist.

  • "break-paths-filter-domains" — Break path marks (for line, area, trail) at invalid values. For non-path marks, this is equivalent to "filter". All scale domains will exclude these filtered data points.

  • "break-paths-show-domains" — Break paths (for line, area, trail) at invalid values. Hide invalid values for non-path marks. All scale domains will include these filtered data points (for both path and non-path marks).

  • "show" or null — Show all data points in the marks and scale domains. Each scale will use the output for invalid values defined in config.scale.invalid or, if unspecified, by default invalid values will produce the same visual values as zero (if the scale includes zero) or the minimum value (if the scale does not include zero).

  • "break-paths-show-path-domains" (default) — This is equivalent to "break-paths-show-domains" for path-based marks (line/area/trail) and "filter" for non-path marks.

Note: If any channel’s scale has an output for invalid values defined in config.scale.invalid, all values for the scales will be considered “valid” since they can produce a reasonable output for the scales. Thus, fields for such channels will not be filtered and will not cause path breaks.

Examples

To understand how these modes affect common marks, see these examples below, which visualize this dataset:

[
  {"a": null, "b": 100},
  {"a": -10, "b": null},
  {"a": -5, "b": -25},
  {"a": -1, "b": -20},
  {"a": 0, "b": null},
  {"a": 1, "b": 30},
  {"a": 5, "b": 40},
  {"a": 10, "b": null}
]

by assigning "a" to x-axis (as quantitative and ordinal fields) and "b" to y-axis.

"filter"

The "filter" invalid mode excludes all invalid values from the visualization’s marks and scales.

For path marks (for line, area, trail), this option will create paths that connect valid points, as if the points with invalid values do not exist.

"break-paths"

Break path marks (for line, area, trail) at invalid values. For non-path marks, this is equivalent to "filter". All scale domains will exclude these filtered data points.

"break-paths-show-domains"

This option is like "break-paths", except that all scale domains will instead include these filtered data points.

"show"

Include all data points in the marks and scale domains. Each scale will use the output for invalid values defined in config.scale.invalid or, if unspecified, by default invalid values will produce the same visual values as zero (if the scale includes zero) or the minimum value (if the scale does not include zero).

"break-paths-show-path-domains" (Default)

For historical reasons, Vega-Lite 5 currently uses "break-paths-show-path-domains" as the default invalid data mode (to avoid breaking changes). This is equivalent to "break-path-keep-domains" for path-based marks (line/area/trail) and "filter" for other marks.

Scale Output for Invalid Values

You can use config.scale.invalid to defines scale outputs per channel for invalid values.

Property Type Description
invalid ScaleInvalidDataConfig

An object that defines scale outputs per channel for invalid values (nulls and NaNs on a continuous scale).

  • The keys in this object are the scale channels.
  • The values is either "zero-or-min" (use zero if the scale includes zero or min value otherwise) or a value definition {value: ...}.

Example: Setting this config.scale.invalid property to {color: {value: '#aaa'}} will make the visualization color all invalid values with ‘#aaa’.

See https://vega.github.io/vega-lite/docs/invalid-data.html for more details.

Example: Output Color and Size with “Filter” Mode

A visualization with "filter" invalid data mode will not filter (not exclude) color and size encoding if config.scale.invalid.color and config.scale.invalid.size are specified.

Compare this with a similar spec, but without config.scale.invalid:

Example: Output Color with “Show” Mode

As discussed earlier, by default invalid values will produce the same visual values as zero (if the scale includes zero) or the minimum value (if the scale does not include zero).

However, you may use config.scale.invalid to override the output for invalid data values:

Other solutions

Note that mark.invalid and config.scale.invalid are options for handling invalid data without changing data or marks.

However, you may use other Vega-Lite features such as conditional encoding, layering, and window transforms to encode invalid data.

Example: Conditional Encoding

If you do not use color encoding, you may use conditional color encoding to use a specific color (e.g., gray) to encode invalid values.

Example: Layering

You may also use different marks (such as bars) to encode null data.

Example: Using window transform to impute missing values