You may sometimes need to merge a large dataset with a small one. You can easily do this using either a JOIN or a "lookup plus reformat".The lookup file approach may run faster because lookup files are stored in memory .To use the "lookup plus reformat"you place a LOOKUP FILE component plus a REFORMAt component in a graph.
When "lookup plus reformat" is better than Join?
use "lookup plus reformat" to merge datasets when :
All but one of the inputs are small enough to fit into memory.
The joining expression is complex and uses several lookup tables.
The joining expression involves intervals or pattern matching.
The lookup file will not grow significantly over time(If the memory limits are exceeded with a lookup file, the graphs fails.)
When is a JOIN better than "lookup plus reformat"?
Use a JOIN component to merge data set when :
You need to make the graph easy to read and understand.
You need to perform a full outer join or semi-join.
The lookup file may grow significantly over time.
One of your non-driving inputs is too large to fit into memory.
The driving parameter specifies the port where the largest input is; the parameter is available only when the sorted-input is set to in memory;Input need not be sorted.The driving input flows over other input records without being sorted .All other inputs records are read into memory.
For an in-memory join, the component loads into memory as much of the non-driving input fiut as can (per the max-core parameter) .what doesn't fit into memory is landed to disk .(This may be ok if you'd rather pay the cost in disk space instead of memory.)
No comments:
Post a Comment