Window
Edit this pageThe window transform performs calculations over sorted groups of data objects. These calculations including ranking, lead/lag analysis, and aggregates such as running sums and averages. Calculated values are written back to the input data stream.
Documentation Overview
Window Field Definition
// A View Specification
{
...
"transform": [
{
// Window Transform
"window": [{
"op": ...,
"field": ...,
"param": ...
"as": ...
}],
"sort": [
{"field": ..., "order": ...}
],
"ignorePeers": ...,
"groupby": [
"..."
],
"frame": [...,...]
}
...
],
...
}
Window Transform Definition
Property | Type | Description |
---|---|---|
window | WindowFieldDef[] |
Required. The definition of the fields in the window, and what calculations to use. |
frame | (Null | Number)[] |
A frame specification as a two-element array indicating how the sliding window should proceed. The array entries should either be a number indicating the offset from the current data object, or null to indicate unbounded rows preceding or following the current data object. The default value is Default value:: |
ignorePeers | Boolean |
Indicates if the sliding window frame should ignore peer values. (Peer values are those considered identical by the sort criteria). The default is false, causing the window frame to expand to include all peer values. If set to true, the window frame will be defined by offset values only. This setting only affects those operations that depend on the window frame, namely aggregation operations and the first_value, last_value, and nth_value window operations. Default value: |
groupby | String[] |
The data fields for partitioning the data objects into separate windows. If unspecified, all data points will be in a single group. |
sort | SortField[] |
A sort field definition for sorting data objects within a window. If two data objects are considered equal by the comparator, they are considered “peer” values of equal rank. If sort is not specified, the order is undefined: data objects are processed in the order they are observed and none are considered peers (the ignorePeers parameter is ignored and treated as if set to |
Window Transform Field Definition
Property | Type | Description |
---|---|---|
op | String | String |
Required. The window or aggregation operations to apply within a window, including |
param | Number |
Parameter values for the window functions. Parameter values can be omitted for operations that do not accept a parameter. See the list of all supported operations and their parameters here. |
field | String |
The data field for which to compute the aggregate or window function. This can be omitted for window functions that do not operate over a field such as |
as | String |
Required. The output name for the window operation. |
Sort Field Definition
Property | Type | Description |
---|---|---|
field | String |
Required. The name of the field to sort. |
order | String |
Whether to sort the field in ascending or descending order. |
Window Only Operation Reference
The valid operations include all aggregate operations plus the following window operations.
Operation | Parameter | Description |
---|---|---|
row_number | None | Assigns each data object a consecutive row number, starting from 1. |
rank | None | Assigns a rank order value to each data object in a window, starting from 1. Peer values are assigned the same rank. Subsequent rank scores incorporate the number of prior values. For example, if the first two values tie for rank 1, the third value is assigned rank 3. |
dense_rank | None | Assigns dense rank order values to each data object in a window, starting from 1. Peer values are assigned the same rank. Subsequent rank scores do not incorporate the number of prior values. For example, if the first two values tie for rank 1, the third value is assigned rank 2. |
percent_rank | None | Assigns a percentage rank order value to each data object in a window. The percent is calculated as (rank - 1) / (group_size - 1). |
cume_dist | None | Assigns a cumulative distribution value between 0 and 1 to each data object in a window. |
ntile | Number | Assigns a quantile (e.g., percentile) value to each data object in a window. Accepts an integer parameter indicating the number of buckets to use (e.g., 100 for percentiles, 5 for quintiles). |
lag | Number | Assigns a value from the data object that precedes the current object by a specified number of positions. If no such object exists, assigns null . Accepts an offset parameter (default 1 ) that indicates the number of positions. This operation must have a corresponding entry in the fields parameter array. |
lead | Number | Assigns a value from the data object that follows the current object by a specified number of positions. If no such object exists, assigns null . Accepts an offset parameter (default 1 ) that indicates the number of positions. This operation must have a corresponding entry in the fields parameter array. |
first_value | None | Assigns a value from the first data object in the current sliding window frame. This operation must have a corresponding entry in the fields parameter array. |
last_value | None | Assigns a value from the last data object in the current sliding window frame. This operation must have a corresponding entry in the fields parameter array. |
nth_value | Number | Assigns a value from the nth data object in the current sliding window frame. If no such object exists, assigns null . Requires a non-negative integer parameter that indicates the offset from the start of the window frame. This operation must have a corresponding entry in the fields parameter array. |
Examples
Below are some common use cases for the window transform.
Cumulative Frequency Distribution
Here we use the window transform with frame: [null, 0]
to accumulate count in a cumulative frequency distribution plot.
See also: layered histogram and cumulative histogram
Percent of Total
Here we use the window transform to derive the global sum so that we can calculate percentage.
Difference from Mean
One example is to show the “exemplar” movies from a movie collection. Here “exemplar” is defined by having a score of 2.5 points higher than the global average.
Another example is to show the “exemplar” movies based on the release year average. Here “exemplar” is defined by having a score 2.5 points higher than the annual average for its release year (instead of the global average).
Rather than filtering the above two examples we can also calculate a residual by deriving the mean using the window transform first.
Rank Chart
We can also use rank
operator to calculate ranks over time.
Top K
Here we use window transform to derive the total number of students along with the rank of the current student to determine the top K students and display their score.
Cumulative Running Average
Here we use window transform to visualize how the average MPG for vehicles have changed over the years.