Talend Training: How to Optimize Tmap Component in Talend

About Talend

Talend Open Studio operates as a code generator allowing data transformation scripts and underlying programs to be generated either in Java (OR) Perl. Its GUI is made of a metadata repository and a graphical designer. The metadata repository contains the definitions and configuration for each job. The information in the metadata repository is used by all of the components of Talend Open Studio.

Course Details : http://talend-training.blogspot.in/2013/04/talend-training-course-details.html

How to Optimize Tmap Component in Talend

Optimize Tmap component in Talend

If Lookup table has few rows:

Open tmap select lookup model as Load Once .This will load lookup data one time before mail flow starts.
This Lookup data will be stored in memory then main flow execution is very fast with comparing with lookup data in memory.

IF lookup table data is very large:
Talend cannont store the lookup table data in memory. you will get java heap space Execption. To resolve this issue follow the below steps

open tmap
go to lookup table
click on tmap settings
select the value for store temp data property to True
click on ok
In Tmap properties basic settings set Temp Data Directory path by browsing folder
Go to Advance settings
set max buffer size(nb of row) to some value based on lookup condition data type

In my jobs lookup stressful for 2 mill records with 3 GB of memory of integer data type.

15 comments:

Anonymous3 April 2012 at 09:36
Any ideas on what "some value based on lookup condition data type" would be? Or what they should be based on.

I have a lookup of 25 million records, but only 2 columns. Currently buff size = 100 000.
ReplyDelete
Replies
Unknown4 May 2012 at 23:09
Hi Patrick ,

Can you describe your scenario more detail ?

Other scenario Optimize Tmap:

If look-up table is very huge and main table is very small then we can do it other Way.

you can do the look-up for each row in your main table.
this will do not load your lookup table.
For Each Record in the main flow --> it will fire one select query on lookup table.

Steps:
* In Tamp setting set the property lookup model=Reload at each row(catche) insted of load once.

* Then set expr key and column (lookup condition columns)
ReplyDelete
Replies
liz2 October 2012 at 08:19
Hi, your post is very useful for me,i have a question that, if we have 70 mill rows in the main table, and two lookup table have the 10 mill rows.the memory is 8gb. what settings will be best in this case? Thank you.
ReplyDelete
Replies
RAJA3 October 2012 at 00:26
Hi Liz,

You need to
1.select the value for store temp data property to True for two 10 mill lookup tables.

2. you can optimize your job by changing
max buffer size(nb of row). Run your job by increasing the max buffer size by 1mill or 500K then you will find the optimized value for your memory.
ReplyDelete
Replies
Unknown15 October 2012 at 11:12
Hi BeardoraemoN,

Yup, we have this problem in talend. We have some work around to resolve this. use tFiledelete component once you job is completed. This component have on option for deleting complete folder aslo.

steps :
1. provide specific path for "temp data directory path" property in tmap
2. take tFileDelete and connect this component to your job with on subjob ok (trigger flow) option.
3. Check deleteFolder option in tFileDelete Component.

then you folder will be deleted after completion of your job. Then you will get free space.

thanks,
Raja K

ReplyDelete
Replies
Unknown30 August 2014 at 03:52
Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging!

Hadoop Training in Chennai

ReplyDelete
Replies
Unknown22 July 2016 at 11:59
i have lookup with 4 cror and main flow is 6 lac, could you suggest the flow
ReplyDelete
Replies
Unknown22 July 2016 at 12:02
i have lookup with 4 cror and main flow is 6 lac, could you suggest the flow
ReplyDelete
Replies
Unknown18 January 2017 at 22:19
Nice blog very nice information you given thanks for sharing. I like to share few more information about Talend Training
ReplyDelete
Replies
karthik golagani22 March 2017 at 04:26
This comment has been removed by the author.
ReplyDelete
Replies
karthik golagani22 March 2017 at 04:37
How can we optimise the tMap in big data batch?
My job is producing the output when ran with a table of 2 million records.Everything works fine.
But when the same job is applied to a bigger table of 16+ million records of hbase table, the job runs fine but there is no output being produced at the output folder.
What can be the issue?
Is this a memory issue.
ReplyDelete
Replies

Add comment