Abinitio Interview Questions: ICFF Lookup -

When your lookup file is MFS and big then we should go for ICFF lookup.

Block Compressed Lookup component, creates 2 files; a Block Compressed Data File and an index file containing indexes that refer to the blocks in the data file.

This is a kind of dynamic lookup file that loads the data in memory only when it is referenced.

For eg: You have an existing graph joining 2 files with one file having around 100 million data whereas the other file having around 50 million. This job will take enough time to join the two files.

If you are not pulling many fields from one of the files, you can make it a lookup and speed up your process. You can create a block compressed lookup file of one of the files. This process will create 2 files Compressed data file and an index file containing indexes to each block of the data file.

Now you can read other files as a single flow and in reformat use the lookup_load function to only load the specific block in memory and perform a lookup on it.

This will save your memory and speed up the process.

How Indexed Compressed Flat file works?

To create an ICFF, we need presorted data. WRITE BLOCK-COMPRESSED LOOKUP component, compresses and chunks the data into blocks of more or less equal size. The graph then stores the set of compressed blocks in a data file, each file is associated with a separately stored index that contains pointers back to the individual data blocks. Together, the data file and its index form a single ICFF. A crucial feature is that, during a lookup operation, most of the compressed lookup data remains on disk —the graph loads only the relatively tiny index file into memory.

Generation -

The addition of data to an ICFF is possible even while it is being used by a graph. Each chunk of added update data is called a generation. Each generation is compressed separately; it consists of blocks, just like the original data, and has its own index, which is simply concatenated with the original index.

How Generations are created -

As an ICFF generation is being built, the ICFF building graph writes compressed data to disk as the blocks reach the appropriate size. Meanwhile, the graph continues to build an index in memory. In a batch graph, an ICFF generation ends when the graph or graph phase ends. In a continuous graph, an ICFF generation ends at a checkpoint boundary. Once the generation ends, the ICFF building graph writes the completed index to disk.

ICFF presents advantages in a number of categories -

Disk requirement - Because ICFF stores compressed data in a flat without the overhead associated with a DBMS, they require much less disk storage capacity than databases - on the order of 10 timeless.

Memory Requirement - Because ICFF organizes data in discrete blocks, only a small portion of the data needs to be loaded in memory at any amount of time.

Speed - ICFFs allows you to create a successive generation of updated information without any pause in processing. That means the time between a transaction taking place and the results of that transaction beginning accessibly can be a matter of seconds.

Performance - Making a large number of queries against database tables that are continually being updated can slow down a DBMS. In such applications, ICFF outperforms databases.

The volume of data - ICFFS can easily accommodate the very large amount of data - so large, In fact, that it can be feasible to take hundreds of terabytes of data from archive types, convert it into ICFFS and make it available for online access and processing

Abinitio Interview Questions

Wednesday, 10 November 2021

ICFF Lookup -

No comments:

Post a Comment

how to create dml dynamically in Ab-initio

Report Abuse