ROLLUP processes groups of input records that have the same key, generating one output record for each group. Typically, the output record summarizes or aggregates the data in some way; for example, a simple ROLLUP might calculate a sum or average of one or more input fields. ROLLUP can select certain information from each group; for example, it might output the largest value in a field, or accumulate a vector of values that conform to specific criteria.
Two modes to use ROLLUP-
You can use a ROLLUP component in two modes, depending on how you define the transform parameter:
1) Template mode — You define a simple rollup function that may include aggregation functions. Template mode is the most common way to use ROLLUP.
2) Expanded mode — You create a transform using an expanded rollup package. This mode allows for rollups that do not necessarily use regular aggregation functions.
Using ROLLUP in template mode
Template mode is the simplest way to use the ROLLUP component. In the component transform, you can specify an aggregation function that describes how the data should be “rolled up,” or summarized, usually in some cumulative way. For example, suppose you have an input record for each purchase by each customer. You could use the sum aggregation function to determine the total amount spent by each customer.
A template ROLLUP transform has only one required function, rollup. It has one input argument, the input record; and one output argument, the output record:
out :: rollup(in)
Using ROLLUP in expanded mode
Expanded mode provides more control over the transform. It lets you edit the expanded package, so you can specify transformations that are not possible with template mode. As such, you might use it when you need a result that an aggregation function cannot produce. However, expanded mode is not commonly used: for most results, using an aggregation function is sufficient.
If you want to use aggregation functions, add them to the transform before switching from template mode to expanded mode, and don’t use them outside of the rollup function. Any aggregation functions added after the transform has been expanded will likely result in an error.
With an expanded ROLLUP package, you must define the following items:
DML type named temporary_type
initialize function that returns a temporary_type record
rollup function that takes two input arguments (an input record and a temporary_type record) and returns an updated temporary_type record
finalize function that returns an output record
Examples of an expanded rollup
Top values
For example, using input data that consists of purchases keyed by customer ID, suppose you want to know the three highest purchases made by each customer. You can use an expanded rollup to output a vector containing these three values.
The output record format consists of the following fields:
record
decimal(8) custid;
decimal(6,2)[int] largest_purchases;
string(1) newline = "\n";
end
The expanded transform function is as follows:
constant int how_many = 3;
type temporary_type =
record
decimal(6,2)[int] largest;
end;
temp :: initialize(in) =
begin
temp.largest :: [vector];
end;
out :: rollup(temp, in) =
begin
out.largest :: vector_slice(vector_merge(temp.largest, [vector in.purchase], {descending}), 0, how_many - 1);
end;
out :: finalize(temp, in) =
begin
out.largest_purchases :: temp.largest;
out.custid :: in.custid;
end;
The following sections describe the types and functions in detail.
1) temporary_type — This type definition consists of a vector to store temporary results.
2) initialization — This function, called at the beginning of each key group, initializes the value of the temporary type. The vector largest is initialized as an empty vector.
3) rollup — The rollup function is called for every input record. In this example, the output is a vector containing the current top three purchases. The function starts by merging (using vector_merge) the current value of in.purchase with the existing temporary type, largest, while maintaining the elements in order from highest to lowest. Then vector_slice is called on the merged vector, returning a vector containing only the first three elements.
4) finalize — The finalize function is called after the last record in a key group has been processed. It produces the output record.
No comments:
Post a Comment