Class GroupByNode

  • All Implemented Interfaces:
    Optimizable, Visitable

    class GroupByNode
    extends SingleChildResultSetNode
    A GroupByNode represents a result set for a grouping operation on a select. Note that this includes a SELECT with aggregates and no grouping columns (in which case the select list is null) It has the same description as its input result set.

    For the most part, it simply delegates operations to its bottomPRSet, which is currently expected to be a ProjectRestrictResultSet generated for a SelectNode.

    NOTE: A GroupByNode extends FromTable since it can exist in a FromList.

    There is a lot of room for optimizations here:

    • agg(distinct x) group by x => agg(x) group by x (for min and max)
    • min()/max() use index scans if possible, no sort may be needed.
    • Field Detail

      • groupingList

        GroupByList groupingList
        The GROUP BY list
      • aggregates

        private java.util.List<AggregateNode> aggregates
        The list of all aggregates in the query block that contains this group by.
      • aggInfo

        private AggregatorInfoList aggInfo
        Information that is used at execution time to process aggregates.
      • parent

        FromTable parent
        The parent to the GroupByNode. If we need to generate a ProjectRestrict over the group by then this is set to that node. Otherwise it is null.
      • addDistinctAggregate

        private boolean addDistinctAggregate
      • singleInputRowOptimization

        private boolean singleInputRowOptimization
      • addDistinctAggregateColumnNum

        private int addDistinctAggregateColumnNum
      • isInSortedOrder

        private final boolean isInSortedOrder
      • havingClause

        private ValueNode havingClause
    • Constructor Detail

      • GroupByNode

        GroupByNode​(ResultSetNode bottomPR,
                    GroupByList groupingList,
                    java.util.List<AggregateNode> aggregates,
                    ValueNode havingClause,
                    SubqueryList havingSubquerys,
                    int nestingLevel,
                    ContextManager cm)
             throws StandardException
        Constructor for a GroupByNode.
        Parameters:
        bottomPR - The child FromTable
        groupingList - The groupingList
        aggregates - The list of aggregates from the query block. Since aggregation is done at the same time as grouping, we need them here.
        havingClause - The having clause.
        havingSubquerys - subqueries in the having clause.
        nestingLevel - NestingLevel of this group by node. This is used for error checking of group by queries with having clause.
        cm - The context manager
        Throws:
        StandardException - Thrown on error
    • Method Detail

      • getIsInSortedOrder

        boolean getIsInSortedOrder()
        Get whether or not the source is in sorted order.
        Returns:
        Whether or not the source is in sorted order.
      • addAggregates

        private void addAggregates()
                            throws StandardException
        Add the extra result columns required by the aggregates to the result list.
        Throws:
        standard - exception
        StandardException
      • addDistinctAggregatesToOrderBy

        private void addDistinctAggregatesToOrderBy()
        Add any distinct aggregates to the order by list. Asserts that there are 0 or more distincts.
      • addNewPRNode

        private void addNewPRNode()
                           throws StandardException
        Add a new PR node for aggregation. Put the new PR under the sort.
        Throws:
        standard - exception
        StandardException
      • addUnAggColumns

        private java.util.ArrayList<SubstituteExpressionVisitor> addUnAggColumns()
                                                                          throws StandardException
        In the query rewrite for group by, add the columns on which we are doing the group by.
        Returns:
        havingRefsToSubstitute visitors array. Return any havingRefsToSubstitute visitors since it is too early to apply them yet; we need the AggregateNodes unmodified until after we add the new columns for aggregation (DERBY-4071).
        Throws:
        StandardException
        See Also:
        addNewColumnsForAggregation()
      • addNewColumnsForAggregation

        private void addNewColumnsForAggregation()
                                          throws StandardException
        Add a whole slew of columns needed for aggregation. Basically, for each aggregate we add 3 columns: the aggregate input expression and the aggregator column and a column where the aggregate result is stored. The input expression is taken directly from the aggregator node. The aggregator is the run time aggregator. We add it to the RC list as a new object coming into the sort node.

        At this point this is invoked, we have the following tree:

          PR - (PARENT): RCL is the original select list | PR - GROUP BY: RCL is empty | PR - FROM TABLE: RCL is empty

        For each ColumnReference in PR RCL

        • clone the ref
        • create a new RC in the bottom RCL and set it to the col ref
        • create a new RC in the GROUPBY RCL and set it to point to the bottom RC
        • reset the top PR ref to point to the new GROUPBY RC
        For each aggregate in aggregates
        • create RC in FROM TABLE. Fill it with aggs Operator.
        • create RC in FROM TABLE for agg result
        • create RC in FROM TABLE for aggregator
        • create RC in GROUPBY for agg input, set it to point to FROM TABLE RC
        • create RC in GROUPBY for agg result
        • create RC in GROUPBY for aggregator
        • replace Agg with reference to RC for agg result
        .

        For a query like,

                  select c1, sum(c2), max(c3)
                  from t1 
                  group by c1;
                  
        the query tree ends up looking like this:
                    ProjectRestrictNode RCL -> (ptr to GBN(column[0]), ptr to GBN(column[1]), ptr to GBN(column[4]))
                              |
                    GroupByNode RCL->(C1, SUM(C2), <agg-input>, , MAX(C3), <agg-input>, <aggregator>)
                              |
                    ProjectRestrict RCL->(C1, C2, C3)
                              |
                    FromBaseTable
                    
        The RCL of the GroupByNode contains all the unagg (or grouping columns) followed by 3 RC's for each aggregate in this order: the final computed aggregate value, the aggregate input and the aggregator function.

        The Aggregator function puts the results in the first of the 3 RC's and the PR resultset in turn picks up the value from there.

        The notation (ptr to GBN(column[0])) basically means that it is a pointer to the 0th RC in the RCL of the GroupByNode.

        The addition of these unagg and agg columns to the GroupByNode and to the PRN is performed in addUnAggColumns and addAggregateColumns.

        Note that that addition of the GroupByNode is done after the query is optimized (in SelectNode#modifyAccessPaths) which means a fair amount of patching up is needed to account for generated group by columns.

        Throws:
        standard - exception
        StandardException
      • getParent

        final FromTable getParent()
        Return the parent node to this one, if there is one. It will return 'this' if there is no generated node above this one.
        Returns:
        the parent node
      • toString

        public java.lang.String toString()
        Convert this object to a String. See comments in QueryTreeNode.java for how this should be done for tree printing.
        Overrides:
        toString in class FromTable
        Returns:
        This object as a String
      • printSubNodes

        void printSubNodes​(int depth)
        Prints the sub-nodes of this object. See QueryTreeNode.java for how tree printing is supposed to work.
        Overrides:
        printSubNodes in class SingleChildResultSetNode
        Parameters:
        depth - The depth of this node in the tree
      • flattenableInFromSubquery

        boolean flattenableInFromSubquery​(FromList fromList)
        Evaluate whether or not the subquery in a FromSubquery is flattenable. Currently, a FSqry is flattenable if all of the following are true: o Subquery is a SelectNode. o It contains no top level subqueries. (RESOLVE - we can relax this) o It does not contain a group by or having clause o It does not contain aggregates.
        Overrides:
        flattenableInFromSubquery in class SingleChildResultSetNode
        Parameters:
        fromList - The outer from list
        Returns:
        boolean Whether or not the FromSubquery is flattenable.
      • isOneRowResultSet

        boolean isOneRowResultSet()
                           throws StandardException
        Return whether or not the underlying ResultSet tree will return a single row, at most. This is important for join nodes where we can save the extra next on the right side if we know that it will return at most 1 row.
        Overrides:
        isOneRowResultSet in class SingleChildResultSetNode
        Returns:
        Whether or not the underlying ResultSet tree will return a single row.
        Throws:
        StandardException - Thrown on error
      • considerPostOptimizeOptimizations

        void considerPostOptimizeOptimizations​(boolean selectHasPredicates)
                                        throws StandardException
        Consider any optimizations after the optimizer has chosen a plan. Optimizations include: o min optimization for scalar aggregates o max optimization for scalar aggregates
        Parameters:
        selectHasPredicates - true if SELECT containing this vector/scalar aggregate has a restriction
        Throws:
        StandardException - on error