Aggregate Expressions in Greenplum

Aggregate Expressions

An aggregate expression represents the application of an aggregate function across the rows selected by a query. An aggregate function reduces multiple inputs to a single

output value, such as the sum or average of the inputs. The syntax of an aggregate expression is one of the following:

aggregate_name (expression [ , ... ] ) [FILTER (WHERE condition)]

aggregate_name (ALL expression [ , ... ] ) [FILTER (WHERE condition)]

aggregate_name (DISTINCT expression [ , ... ] ) [FILTER (WHERE condition)]

aggregate_name ( * ) [FILTER (WHERE condition)]

Where aggregate_name is a previously defined aggregate (possibly qualified with a schema name), and expression is any value expression that does not itself contain an

aggregate expression.

The first form of aggregate expression invokes the aggregate across all input rows for which the given expression(s) yield non-null values. The second form is the same as

the first, since ALL is the default. The third form invokes the aggregate for all distinct non-null values of the expressions found in the input rows. The last form invokes the

aggregate once for each input row regardless of null or non-null values; since no particular input value is specified, it is generally only useful for the count(*)

aggregate function.

For example, count(*) yields the total number of input rows; count(f1) yields the number of input rows in which f1 is non-null; count(distinct f1) yields the

number of distinct non-null values of f1.

The FILTER clause allows you to specify a condition to limit the input rows to the aggregate function. For example:

SELECT count(*) FILTER (WHERE gender='F') FROM employee;

The WHERE condition of the FILTER clause cannot contain a set returning function, subquery, a window function, or an outer reference. If using a user-defined aggregate

function, the state transition function must be declared as STRICT (see CREATE AGGREGATE).

Greenplum Database provides special aggregate expressions for inverse distribution functions. These are as follows:

PERCENTILE_CONT(_percentage_) WITHIN GROUP (ORDER BY _expression_)

PERCENTILE_DISC(_percentage_) WITHIN GROUP (ORDER BY _expression_)

Currently only these two expressions are used with the keyword WITHIN GROUP. Greenplum Database also provides the MEDIAN aggregate function. This function

returns the 50 percentile of the PERCENTILE_CONT result.

Limitations of Aggregate Expressions

The following are the limitations of the aggregate expressions:

1. You cannot use these aggregate expressions with the keywords ALL, DISTINCT, FILTER and OVER.

2. You cannot use these aggregate expressions with the grouping specifications, CUBE, ROLLUP, and GROUPING SETS.

3. An aggregate expression may only appear in the result list or HAVING clause of a SELECT command. It is forbidden in other clauses, such as WHERE, because those clauses are logically evaluated before the results of aggregates are formed.

4. When an aggregate expression appears in a subquery, the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate’s arguments contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query. The aggregate expression as a whole is then an outer reference for the subquery it appears in, and acts as a constant over any one evaluation of that subquery. The restriction about appearing only in the result list or HAVING clause applies with respect to the query level that the aggregate belongs to.

Greenplum Database currently does not support DISTINCT with more than one input expression