In my last post I promised to cover the Segment iterator in more detail, so here we go.
The Segment iterator partitions rows into groups as they flow through a query plan, checking whether the current row belongs in the same group as the previous row. For this to work, the incoming rows must be presented in an order which guarantees that all members of a group are received sequentially.
Segment has a “Group By” argument to specify how it should partition its input. It adds an additional column as rows flow through it, which is used to communicate with its parent iterator. This extra column is named something like [Segment1003] and is visible in the graphical query plan's properties window, or by hovering your mouse cursor over the iterator (the attribute name in both cases is 'Segment Column').
Like almost all iterators, Segment processes rows one at a time, and signals the start of a new group in the Segment Column. Let's look again at the query plan from my last post showing Segment followed by Top:
The index scan produces rows in an order suitable for the Segment to split into groups. The Top iterator passes on some fraction of the rows it receives as normal, and uses the information in the Segment Column to decide when it should reset its count for the next group.
The effect is as if the Segment passed a group of rows all-at-once, and the Top occurred once for each group. You can think of the iterator logically passing a group at a time, whereas physically it processes row-by-row, communicating the state of play with the following Top via the Segment Column.
In case you are wondering if it wouldn't be more efficient for the Segment to pass a group a time to the Top, consider what the Segment would do with the rows which were piling up, waiting for the group to change. We can imagine a scenario where a very large number of wide rows occur for a particular group. In that case, Segment would require either a very large memory grant, or a worktable in tempdb to buffer the rows.
We know that Segment does not request a memory grant, because it would show up in the query plan after execution, and in the sys.dm_exec_query_memory_grants dynamic management view. We can see that a tempdb worktable is not used, by examining the output of the query with SET STATISTICS IO ON. With this option enabled, any worktables used would be reported - but nothing is shown. So, with nowhere to store a potentially huge number of pending group rows, the physical implementation of Segment must be row-by-row.
An interesting consequence of this physical processing is that the Top iterator is "Segment aware". The Segment Column is bound to the Top at compilation time, so that a Top knows which Segment Column it is bound to, in a plan with multiple such iterators.
Segment and Sequence Project
In the ranking function example in my last post, I used DENSE_RANK to determine which rows to output. The query plan also included the Segment iterator, along with a Sequence Project. In this next section, we'll look at the interaction between Segment and Sequence Project, and discover why DENSE_RANK requires two Segment iterators.
The two Segments and the Sequence Project work together to produce the result of the expression DENSE_RANK() OVER (PARTITION BY Shelf, Bin ORDER BY Quantity).
The first Segment has a GROUP BY argument of Shelf and Bin - mirroring the PARTITION BY clause. It uses a Segment Column labelled [Segment1003] to indicate whether the current row is the first of a new group or not.
The second Segment is required because the value of DENSE_RANK changes when the current value is different from the previous row (as well as resetting at the start of a new group). To accommodate this, the query plan needs a second flag, stored in the second Segment Column, [Segment1004].
With these two flags ('start of group' and 'value changed') the Sequence Project can decide whether to increment, reset, or leave unchanged the current internal value for the DENSE_RANK function.
The Sequence Project adds the current internal value as a new column in the data stream. This can be seen in the Defined Values argument of the Sequence Project. In this example, the new column is called [Expr1002].
In the next post in this series, I'll take a look at common-subexpression spools used by windowed aggregates like COUNT OVER and by per-index (wide) update plans.
Operator of the day: Segment - Connor Cunningham
Ranking functions: ROW_NUMBER - Craig Freedman
© Paul White