Tuesday, 28 September 2021

Describe the Evaluation of Parameters order and first_defined function-

Following is the order of evaluation:

- Host setup script will be executed first

- All Common parameters, that is, included, are evaluated

- All Sandbox parameters are evaluated

- The project script – project-start. ksh is executed

- All form parameters are evaluated

- Graph parameters are evaluated

- The End Script of the graph is executed

first_defined -

Returns the first defined (non-NULL) value from among the input arguments. The input expressions must all be of the same or compatible types.

Syntax

object first_defined(expr [ , expr ... ] )

Usage -

- > You must specify at least two expressions as arguments for this function.

- > This function is useful in assigning default values to input fields whose values are NULL. The function does the following:

- > Checks each input value to see if the value is defined.

- > Stops at the first defined value.

- > Returns the first defined value.

- > If all arguments evaluate NULL, this function returns NULL.

This function is equivalent to the SQL function COALESCE and similar to the older Oracle function NVL.

Example 1 -

first_defined(expr1 = (decimal(3))1.1, expr2 = "jan", expr3 = "feb", expr4 = "mar")

(Types long and decimal are compatible in the preceding expression.) The function returns the following result:

"1.1"

Example 2 -

In this example, some of the function’s arguments are NULL:

first_defined(NULL, 42, NULL, 43)

The function returns the following result:

Replicate vs Broadcast

BROADCAST COMBINES IN AN ARBITRARY ORDER ALL RECORDS IT RECEIVES INTO A SINGLE FLOW AND WRITES A COPY OF THAT FLOW TO EACH OF ITS OUTPUT FLOW PARTITIONS.

USE BROADCAST TO INCREASE DATA PARALLELISM WHEN YOU HAVE CONNECTED A SINGLE FAN-OUT FLOW TO THE OUT PORT, OR TO INCREASE COMPONENT PARALLELISM WHEN YOU HAVE CONNECTED MULTIPLE STRAIGHT FLOWS TO THE OUT PORT.

REPLICATE ARBITRARILY COMBINES ALL RECORDS IT RECEIVES INTO A SINGLE FLOW AND WRITES A COPY OF THAT FLOW TO EACH OF ITS OUTPUT FLOWS. USE REPLICATE TO SUPPORT COMPONENT PARALLELISM — SUCH AS WHEN YOU WANT TO PERFORM MORE THAN ONE OPERATION ON A FLOW OF RECORDS COMING FROM AN ACTIVE COMPONENT.

REPLICATE VERSUS BROADCAST-
1) REPLICATE AND BROADCAST ARE SIMILAR COMPONENTS, SO IT CAN BE DIFFICULT TO KNOW WHICH ONE TO USE IN A PARTICULAR GRAPH.

2) BROADCAST IS USED TO INCREASE DATA PARALLELISM BY FEEDING RECORDS TO FAN-OUT OR ALL-TO-ALL FLOWS.

3) REPLICATE IS GENERALLY USED TO INCREASE COMPONENT PARALLELISM, EMITTING MULTIPLE STRAIGHT FLOWS TO SEPARATE PIPELINES.

SPECIFICALLY, THE DIFFERENCE BETWEEN THEM LIES IN HOW THEIR FLOWS ARE SET UP AND HOW THEIR LAYOUTS ARE PROPAGATED IN THE GDE.

4) REPLICATE ALLOWS MULTIPLE OUTPUTS FOR A GIVEN LAYOUT AND PROPAGATES THE LAYOUT FROM THE INPUT TO THE OUTPUT.

5) BROADCAST IS A PARTITIONING COMPONENT THAT DEFINES THE TRANSITION FROM ONE LAYOUT TO ANOTHER.

The Runtime behavior of REFORMAT -

1. REFORMAT reads a record from the in port.

2. If the select parameter has an expression specified, REFORMAT uses the expression to evaluate the input record:

-> If the expression evaluates to false (0), REFORMAT discards the input record and starts over with step

-> If the expression produces NULL, REFORMAT writes a descriptive error message and stops the execution of the graph.

-> If the expression evaluates to true (anything other than 0 or NULL), REFORMAT begins processing the input record.

3) If the select parameter does not have a value, REFORMAT begins processing the input record.

4) REFORMAT determines whether a transform function is specified in either the output-index or output-indexes parameter:

-> If neither output-index nor output-indexes have a value (the usual case when there is only one out port), REFORMAT sends the input record to every transform-out port pair, beginning with out0 and progressing sequentially.

-> If output-index or output-indexes has a value, REFORMAT evaluates the specified index transform. If output-indexes are defined, it should return a vector of port index values. If output-index is defined, it should return a single port index value.

REFORMAT uses one or more values from the index transform to determine the appropriate transform-output port pair or pairs for the input record. If the index transform returns more than one value, REFORMAT sends the record to each of the appropriate ports, starting with the lowest numbered port and progressing to the other ports sequentially.

If the index transform returns an error, REFORMAT rejects the input record to the reject0 port (where it is indistinguishable from records rejected by transform 0).

5) REFORMAT determines whether each out port has a transform function.

-> If an out port does not have a transform function, REFORMAT uses implicit reformat to process the input record.

-> If the input record is sent to more than one port, the order of the transform evaluation is sequential: it calls the transform function on each port in order, starting with the lowest numbered port. For example, if the record is to be sent to port0 and port2, it is sent to port0 first, and then to port2. The evaluation of the second transform can depend on the side-effects of the first transform, which means you could make successive calls to a function like next_in_sequence from sequential transforms for the same input record.

6) If a transform function results in an error or returns NULL, REFORMAT writes the following:

An error message to the corresponding error port

The current input record to the corresponding reject port

The component stops execution of the graph when the number of reject events exceeds the reject threshold

If the reject or error ports do not have flows attached to them, REFORMAT discards the record.

7) REFORMAT writes the record to the out port of each successful transform, and then begins processing the next input record.

max-core For Abinitio Components

max-core -

The max-core parameter is found in the SORT, JOIN, and ROLLUP components, among others.

max-core Value -

There is no single optimal value for the max-core parameter. A “good” value depends on your particular graph and the environment in which it runs, as well as on the data.

Details -

A component’s max-core parameter determines the maximum amount of memory the component will consume per partition before it spills to disk.

When the value of max-core is exceeded, all input (in the case of SORT) or the excess input (in the cases of the other components) is dropped to disk in the form of temporary files. This can have a dramatic impact on performance, but it does not mean that it is always better to increase the value of max-core in these situations.

The higher you set the value of max-core, the more memory the component can use. Using more memory generally improves performance — up to a point.

Beyond this point, performance will not improve and may even worsen.

If the value of max-core is set too high, operating system swapping can occur; if virtual memory on the machine is exhausted, the graph can fail.

When setting the value for max-core, we can use the suffixes k, m, and g (uppercase is also supported) to indicate powers of 1024.

For max-core, the suffix k (kilobytes) means precisely 1024 bytes. Similarly, the suffix m (megabytes) means precisely 1048576 (1024 power 2),

and g (gigabytes) means precisely 1024 power 3.

In general, using additional memory can improve the performance of in-memory ROLLUP or JOIN, but not of SORT.

When spillage occurs, consider setting the configuration variable AB_SPILL_FILE_COMPRESSION_LEVEL.

This variable compresses the temporary files spilled to the disk. It is most helpful when you have a fast CPU but a slow disk (which is common).

SORT component -

For the SORT component, 96 MB (100663296 bytes) is the default value for max-core.

Friday, 24 September 2021

Abinitio - Partitioning and De-Partitioning Component

There are two types of partitioning components: partition components, which create data partitions, and departition components, which combine multiple flow partitions or multiple straight flows into a single flow:

De-Partitioning Component -

CONCATENATE appends multiple flow partitions of records one after another.

GATHER arbitrarily combines records from multiple flow partitions.

INTERLEAVE combines blocks of records from multiple flow partitions in a round-robin fashion.

MERGE combines records from multiple flow partitions that have all been sorted according to the same key specifier and maintains the sort order.

Partitioning Component -

BROADCAST arbitrarily combines all the records it receives into a single flow and writes a copy of that flow to each of its output flow partitions.

PARTITION BY EXPRESSION distributes records to its output flow partitions according to a specified DML expression.

PARTITION BY KEY distributes records to its output flow partitions according to key values.

PARTITION BY PERCENTAGE distributes a specified percent of the total number of input records to each output flow.

PARTITION BY RANGE distributes records to its output flow partitions according to the ranges of key values specified for each partition.

PARTITION BY ROUND-ROBIN distributes records evenly to each output flow.

PARTITION WITH LOAD BALANCE distributes records to its output flow partitions, writing more records to the flow partitions that consume records

Abinitio - Transform Function (Rollup,Scan,Join,Reformat,)

Rollup -

Scan -

Join -

When Join Type is explicit -

Tuesday, 7 September 2021

SQL Performance Tuning Technique

1) SELECT fields instead of using SELECT *

2) Avoid SELECT DISTINCT. ...

3) Create joins with INNER JOIN (not WHERE)

4) Use WHERE instead of HAVING to define filters

5) Use wildcards at the end of a phrase only

6) Use LIMIT to sample query results.

7) Avoid Nested Queries & Views.

8) Use IN predicate while querying Indexed columns.

9) Use temp tables.

10) Use CASE instead of UPDATE

11) Avoid using OR in JOINS

What are views in the SQL database?

In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

What is the use of view?

Views enable us to hide some of the columns from the table. It simplifies complex queries because it can draw data from multiple tables and present it as a single table. It helps in data security that shows only authorized information to the users.

Indexes -

Indexes are special lookup tables that the database search engine can use to speed up data retrieval. In another word, an index is a pointer to data in a table.

An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the UPDATE and the INSERT statements.

Abinitio Interview Questions