Thursday, 26 January 2023

Rollup Component

ROLLUP processes groups of input records that have the same key, generating one output record for each group. Typically, the output record summarizes or aggregates the data in some way; for example, a simple ROLLUP might calculate a sum or average of one or more input fields. ROLLUP can select certain information from each group; for example, it might output the largest value in a field, or accumulate a vector of values that conform to specific criteria.

Two modes to use ROLLUP-
You can use a ROLLUP component in two modes, depending on how you define the transform parameter:

1) Template mode — You define a simple rollup function that may include aggregation functions. Template mode is the most common way to use ROLLUP. 

2) Expanded mode — You create a transform using an expanded rollup package. This mode allows for rollups that do not necessarily use regular aggregation functions.

Using ROLLUP in template mode
Template mode is the simplest way to use the ROLLUP component. In the component transform, you can specify an aggregation function that describes how the data should be “rolled up,” or summarized, usually in some cumulative way. For example, suppose you have an input record for each purchase by each customer. You could use the sum aggregation function to determine the total amount spent by each customer.

A template ROLLUP transform has only one required function, rollup. It has one input argument, the input record; and one output argument, the output record:

out :: rollup(in)

Using ROLLUP in expanded mode
Expanded mode provides more control over the transform. It lets you edit the expanded package, so you can specify transformations that are not possible with template mode. As such, you might use it when you need a result that an aggregation function cannot produce. However, expanded mode is not commonly used: for most results, using an aggregation function is sufficient. 

If you want to use aggregation functions, add them to the transform before switching from template mode to expanded mode, and don’t use them outside of the rollup function. Any aggregation functions added after the transform has been expanded will likely result in an error.

With an expanded ROLLUP package, you must define the following items:

DML type named temporary_type 

initialize function that returns a temporary_type record 

rollup function that takes two input arguments (an input record and a temporary_type record) and returns an updated temporary_type record 

finalize function that returns an output record 

Examples of an expanded rollup
Top values
For example, using input data that consists of purchases keyed by customer ID, suppose you want to know the three highest purchases made by each customer. You can use an expanded rollup to output a vector containing these three values. 

The output record format consists of the following fields:

record
  decimal(8) custid;
  decimal(6,2)[int] largest_purchases;
  string(1) newline = "\n";
end
The expanded transform function is as follows: 

constant int how_many = 3;

type temporary_type =
record
  decimal(6,2)[int] largest;
end;

temp :: initialize(in) =
begin
  temp.largest :: [vector];
end;

out :: rollup(temp, in) =
begin
  out.largest :: vector_slice(vector_merge(temp.largest, [vector in.purchase], {descending}), 0, how_many - 1);
end;

out :: finalize(temp, in) =
begin
  out.largest_purchases :: temp.largest;
  out.custid :: in.custid;
end;

The following sections describe the types and functions in detail.

1) temporary_type — This type definition consists of a vector to store temporary results. 

2) initialization — This function, called at the beginning of each key group, initializes the value of the temporary type. The vector largest is initialized as an empty vector. 

3) rollup — The rollup function is called for every input record. In this example, the output is a vector containing the current top three purchases. The function starts by merging (using vector_merge) the current value of in.purchase with the existing temporary type, largest, while maintaining the elements in order from highest to lowest. Then vector_slice is called on the merged vector, returning a vector containing only the first three elements.

4) finalize — The finalize function is called after the last record in a key group has been processed. It produces the output record. 



Friday, 20 January 2023

Ab Initio ETL tool architecture


Co>Operating System

Co>Operating System is a program provided by Abinitio which operates on the top of the operating system and is a base for all Abinitio processes. It provides additional features known as air commands which can be installed on a variety of system environments such as Unix, HP-UX, Linux,IBM AIX, Windows systems. The Abinitio CoOperating System provides the following features:

- Manage and run Abinitio graphs and control the ETL processes

- Provides Abinitio extensions to the operating system

- ETL processes monitoring and debugging

- Metadata management and interaction with the EME


Abinitio GDE (Graphical Development Environment)

GDE is a graphical application for developers which are used for designing and running Abinitio graphs. It also provides:

- The ETL process in Abinitio is represented by Abinitio graphs. Graphs are formed by components (from the standard components library or custom), flows (data streams) and parameters.

- A user-friendly front-end for designing Ab Initio ETL graphs

- Ability to run, debug Ab Initio jobs and trace execution logs

- GDE Abinitio graph compilation process results in generation of a UNIX shell script which may be executed on a machine without the GDE installed


Abinitio EME

Enterprise Meta>Environment (EME) is an Abinitio repository and environment for storing and managing metadata. It provides capability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or Abinitio CoOperating system command line (air commands)

how to create dml dynamically in Ab-initio

 $[ begin let int i = 0; let string(int) complete_list = "emp_nm,billable,age,designation,location"; let string(int) file_content ...