Friday, 30 July 2021

SCD2 Implementation in Abinitio-

We can implement SCD2 using the below scenario in abinitio:-

1. Take 2 tables as your input first would be your today's file that is in0 second be your previous file that would be in1.

2. Take the inner join of both the tables on the matching key say cust_id, and in the outport of the DML make it embedded and do the naming convention by adding the suffix _new for today's file and _old for the prev day file data records so to make it easier to understand the data fields.

3. In the join components, unused 0 will give you inserted records(that would come from today's file), and unused 1(that would come from yesterday's file) will give you deleted records.

4. Take a reformat connecting the join output port and check:

if ( _new != _old) (these are the suffix we have given in the DML output port coming from the join component), force error it from the reject port of reformat those will be your updated records and you will get the unchanged records from the output port of reformat.

5. combine all the inserted records from joining unused 0, updated records from rejecting port of reformat, and unchanged records from outport of reformat and load all of them into the delta table. 





Sunday, 11 July 2021

Cognizant Abinitio Interview - 2021

 1) Run time behavior of Reformat  Component?

2) Difference between Rollup and Scan?

3) Departioning Component?

4) Types of Parallelism?

5) 30 days Older files?

6) Phase and Checkpoint?

7) Lookup File and Lookup Difference?

8) When to use Lookup instead of Join?

9) Scenario - Loading of 1st record into 1st Table, 2nd record into 2nd Table and rest of the records in 3rd Table 

10) Partition by round-robin definition and also how it divides the records? 

What is Block in Partition by round-robin? 

11) SCD type 1,2,3,4

12) Difference between AB_WORK_DIR and AB_DATA_DIR

13) What is dependency analysis and how we can implement it ?

14) Checkin and checkout command ?Checkout specific version of an object ?




Sql Interview Question -

 1) 2nd Max salary -

SELECT max(salary)  FROM emp_table WHERE salary < (SELECT max(salary)  FROM emp_table);

or

SELECT max(salary) FROM emp_table WHERE salary not in  (SELECT max(salary)                  FROM emp_table); 


2) Nth max salry -

select id,saalry from emp e1 where Nth - 1 = (select count(distinct salary) from emp e2 where e2.salary > e1.salary )

3) Delete Duplicates From a Table in SQL Server

WITH cte AS (

    SELECT 

        contact_id, 

        first_name, 

        last_name, 

        email, 

        ROW_NUMBER() OVER (

            PARTITION BY 

                first_name, 

                last_name, 

                email

            ORDER BY 

                first_name, 

                last_name, 

                email

        ) row_num

     FROM 

        sales.contacts

)

DELETE FROM cte

WHERE row_num > 1;

 (CTE is  common table expression (CTE) )


4) Using GROUP BY clause to find duplicates in a table

SELECT      a,     b,     COUNT(*) occurrences FROM t1  GROUP BY     a,     b 

HAVING     COUNT(*) > 1;

5) Using ROW_NUMBER() function to find duplicates in a table

The following statement uses the ROW_NUMBER() function to find duplicate rows based on both a and b columns: 

WITH cte AS (

    SELECT 

        a, 

        b, 

        ROW_NUMBER() OVER (

            PARTITION BY a,b

            ORDER BY a,b) rownum

    FROM 

        t1

SELECT 

  * 

FROM 

    cte 

WHERE 

    rownum > 1;


Saturday, 10 July 2021

Abinitio Performance Improvement Technique- Abinitio Performance Improvement Technique-

The following are the ways to improve the performance of a graph :

-Try to use partitioning in the graph

-try minimizing the number of components

- Maintain lookups for better efficiency

-Components like join/ rollup should have the option. Input must be sorted if they are placed after a sort component 

- If a component has In memory: Input need not be sorted option selected, use the MAX_CORE parameter value efficiently.

- Ensure that all the graphs where RDBMS tables are used as input, the join condition is on indexed columns.

- Make sure that a limited number of components are used in a particular phase

- Implement the usage of optimum value of max core values for the purpose of sorting and joining components.

- Utilize the minimum number of sort components.

- Utilize the minimum number of sorted join components and replace them with in-memory join/hash join, if needed and possible.

- Restrict only the needed fields in sort, reformat, join components

- Utilize phasing or flow buffers when merged or sorted joins

- Use sorted join, when two inputs are huge, otherwise use hash join

Thursday, 8 July 2021

Barclays Abinitio Developer Interview Question -

1) SCD  type 2 implementation, full load, daily delta load explanation?

2) What are the components that we are using in SCD type 2 implementation?

3) Types of parallelism?

4) Performance improvement technique?

5) Which component breaks pipeline parallelism?

6) Does sort within-group component breaks pipeline parallelism?

7) What all are the components you have worked on?

8) scenario - max of the transaction? if the data volume is high what should be the optimized approach?
i/p- partition by round-robin - Rollup ( 2 stage routing)

9) Count the number of records from a file in Abinitio? 
I/P - Rollup -O/P
Rollup - 
key - null 
transformation - use count function

10) What all are the script you have worked on?

11) m_db command, air command to check-in, checkout and compare with eme version copy?

12) Migration of code from one env to another environment?

13) Type of lookup file - Lookup and lookup_local?

14) In which scenario we should use lookup and Join?

14) (Unix) Remove empty lines from a flat file? 

15) Delete duplicate from a file without using Dedup sorted component?

Using Rollup key change or  Rollup in the expanded mode we can avoid duplicate(first and last function)

Using rollup in expanded mode -
 
I/P -
Transaction 
1500
1500
2000
2000
2000
4000
4000

Rollup key - Transaction
Transformation - out.Transaction :: first(in.Transaction);
or 
 out.Transaction :: last(in.Transaction);

Using Rollup Key change function
out::rollup(in)=
begin
out.*::in.*;
end;

out::key_change(in1,in2)=
begin
out::in2.Transaction !=in1.Transaction;
end;

15)Read multiple files and write multiple files?

16) (Unix) 30 days older files in a directory find command?

17) (Unix) Command to find the transaction not in 1500 and 2000 in a file?
grep -v '1500/|2000' input_file.dat

18) (Unix) Command to find the transaction greater than 2600?
awk '$1 > 2600' input_file.dat


Wednesday, 7 July 2021

Abinitio Theoretical Interview Question 2021-

Parallelism - 

 There are three types of parallelism that Abinitio supports - 

component parallelism, pipeline  parallelism, and data parallelism

 1) Component parallelism occurs when more than one component is running at the same time on different streams of data. 

 For example, you could use two FILTER BY EXPRESSION components at the same time to find transactions greater than and less than certain amounts:

 2) Pipeline parallelism occurs when two or more components process different parts of the same data stream at the same time. This can happen when the first component produces output records before it has finished reading all its input.  This enables the next component to begin processing before the first one has finished.

 Any component that must read all its input before producing any output is said to “break” pipeline parallelism because the next component must wait for that component to finish before starting. 

For example, a SORT component always breaks pipeline parallelism:  it must see all the records before producing any output because the last record it reads might be the first one in the sort order. 

Similarly, a ROLLUP component whose sorted-input parameter is set to False breaks pipeline parallelism, as its results cannot be produced until the input stream has been completely consumed.

3) Data parallelism occurs when a graph separates data into multiple divisions, allowing multiple copies of program components to operate on the data in all the divisions simultaneously.


Max memory - when sorted input set to true then max memory ( Maximum memory usage (before spilling to disk) in bytes)

when sorted input set to false that is in-memory sort then driving port and max core comes into the picture.

Driving port -  The largest input; all other inputs will be read into memory.

Maxcore - Maximum memory usage (for non-driving inputs before spilling to disk) in bytes.

To print a record whose format or data is corrupt, use the -show-partial argument in the m_dump command.

To count the number of records in a printed output of records, use the -print-n-records argument.-

Using m_dump with -print-n-records

Phases -

A phase is a stage of a graph that runs to completion before the start of the next phase. By dividing a graph into phases, we can make the best use of resources such as memory, disk space, and CPU cycles ensuring, for example, that sufficient resources will be available for an especially demanding part of the job.

The boundary between two phases is called a phase break, and it belongs to the first of the two phases.

In the process of completing one phase before the next begins, the component immediately before a phase break writes all the data passing through it into temporary files in the layout of the component immediately after the phase break. When the first phase finishes running, the components after the phase break read these temporary files to begin the next phase.

Checkpoints -

A checkpoint is a point at which the Co>Operating System saves all the information it would need to restore a job to its state at that point. In case of failure, you can recover completed phases of a job up to the last completed checkpoint. 

In batch graphs, you can have checkpoints only at phase breaks. When you set a phase in the GDE, by default it has a checkpoint at its phase break. We can remove the checkpoint by clicking the Toggle Checkpoint button Icon for the Toggle Checkpoint button from the on position to the off position Icon for the Toggle Checkpoint button as it appears with the checkpoint removed.

As the execution of the graph successfully passes the first checkpoint, the Co>Operating System saves all the information it needs to restore the job to its state at that checkpoint.

As the execution of the graph successfully passes each succeeding checkpoint, the Co>Operating System:

Deletes the information it has saved to be able to restore the job to its state at the preceding checkpoint

Deletes the temporary files it has written in the layouts of the components in all phases since the preceding checkpoint

Commits the effects on the filesystem of all phases since the preceding checkpoint.





 

Abinitio-Unix Scenario Part2 -

 1) How to check if the file size is greater than 10 MB?

find ./ -maxdepth 1 -type f -size +10M

2) How to check active running services related to abinitio?

m_ps | guess 

3) city1 city2 distance

     A          B      100

     B          A      100

     A          C      200

o/p- 

A    B    100

A    C     200

Ans ->  I/p - Rollup - Reformat - sort - dedup - o/p

Rollup - 

sorted input - false

key - city1,city2

Transform - 

 out.city1 :: accumulation(in.city1)

 out.city2 :: accumulation(in.city2)

 out.distance :: in.distance;

Reformat - 

create vector variable source_city assign it to  vector_sort(vector_concat(in.city1.in.city2))

out.city1::source_city[0];

out.city2::source_city[1];

out.distnace::in.distnace;

sort -

 key - city1,city2

dedup - 

key - city1,city2

keep - first 

4)I/P-  

F1   F2

1    (a,b,c,d,e)

2    (a,b,c,d,e)

3    (a,b,c,d,e)

O/P- 

F1    F2

1      a

2      b

3      c

Ans -> I/P -Normalize -O/P

Normalize - 

Length -3

normalize - out.F2 :: string_substring(string_replace(in.F2,",",""),index+2,1)


5) Count the no. of words in below I/P  file ? 

city

 Bangalore

Occurance of "a" is twice.

Ans- > I/P-Redefine-Rollup-O/P

Redefine - 

record

string(1) city;

end

Rollup - Sorted input - False

key - city;

out,city::in.city;

out.count:: count(in.city);


6) Input - 

ID

8

5

9

3

2

What should be the output if we use ID >='2' in select and next_in_sequence in the transformation of Reformat component?

Ans ->

ID Seq

8    1

5    2

9    3

3    4

2    5


What should be the output if we use ID >'2' in select and next_in_sequence in the transformation of Reformat component?

ID Seq

8    1

5    2

9    3

3    4

7) Input 

ID ID1 ID2

A 100 NULL

A NULL 200

B 200 400

B NULL NUll

Output - 

ID  ID2 ID3

A   100  200

B   200  400

Ans->

I/P- Rollup -O/P

Rollup - 

Key - ID

Transform -

out.ID::in.ID;

out.ID1::string_filter_out(concatenation(in.ID1),"NULL")

out.ID2::string_filter_out(concatenation(in.ID2),"NULL")



Monday, 5 July 2021

Abinitio Scenario Questions 2021

 1) Input - 

ID

A

B

C

Output - 

ID

AB

BC

CD 


Ans ->

 Input FIle -  Rollup - Normalization - Output file

Rollup - 

Key - Null key 

Transform - concatenation(in.ID)

Normalization -

 length -> out::string_length(in.id) -1;

Normalize -> out.ID::string_substring(in.ID,index+1,2);


2) Input - 

Index Data 

1        A

2        B

3        C

4        D


output - 

Index Data

1        A

2       AB

3       ABC

4      ABCD


Ans ->

 I/p file - Scan- O/P File

Scan

Key - {}

Transform - 

out.Index :: in.Index;

out.Data :: concatenation(in.Data);


3) Input - 

f1  f2

1   a

2   b

3   c

Output - 

f1 f2

1  abc

2  abc

3  abc

Ans ->

Input File - Reformat (Count = 2)  - Rollup - Join - O/P file 

Reformat -   1st Flow pass f1 field and  2nd pass f2 column.

connect 2nd flow with Rollup (Key - {} , Transform - concatenation(in.f2))

Join reformat 1st flow with Rollup output keep key as {} and Join type as OUTER JOIN .

4) How to reverse a String ?

Input - abinitio

Output- oitiniba

Ans ->

Input File - Reformat  - O/P file 

Reformat - out.ID :: in.ID[::-1]

5)  Input -

Gender 

M

F

M

F

T

T

M

F

What will be the output if we use next_in_sequence()  in select parameter and  Transformation of Reformat component?

Ans ->

Output - 

Gender  Sequence

M           2

F            4

M           6

F            8

T           10

T           12

M          14

F           16

What will be the output if we use next_in_sequence() >=2  in select parameter and  In Transformation next_in_sequence() of Reformat component?

Ans - 

Gender  Sequence

M           3

F            5

M           7

F            9

T           11

T           13

M          15

F           17


6) Output Index and Indexes - 

Input - 

Gender 

M

F

M

F

T

T

M

F

What will be the output of in  Output Index we will pass M into port 0,1 and F into port 1?

Ans ->

The graph will fail as we have not assigned any port for Gender "T"  and the exact error will be - NULL value in assignment for output_indexes_out .





    

how to create dml dynamically in Ab-initio

 $[ begin let int i = 0; let string(int) complete_list = "emp_nm,billable,age,designation,location"; let string(int) file_content ...