Optimize Tmap component in Talend
If Lookup table has few rows:
IF lookup table data is very large:
Talend cannont store the lookup table data in memory. you will get java heap space Execption. To resolve this issue follow the below steps
In my jobs lookup stressful for 2 mill records with 3 GB of memory of integer data type.
If Lookup table has few rows:
- Open tmap select lookup model as Load Once .This will load lookup data one time before mail flow starts.
- This Lookup data will be stored in memory then main flow execution is very fast with comparing with lookup data in memory.
IF lookup table data is very large:
Talend cannont store the lookup table data in memory. you will get java heap space Execption. To resolve this issue follow the below steps
- open tmap
- go to lookup table
- click on tmap settings
- select the value for store temp data property to True
- click on ok
- In Tmap properties basic settings set Temp Data Directory path by browsing folder
- Go to Advance settings
- set max buffer size(nb of row) to some value based on lookup condition data type
In my jobs lookup stressful for 2 mill records with 3 GB of memory of integer data type.
Any ideas on what "some value based on lookup condition data type" would be? Or what they should be based on.
ReplyDeleteI have a lookup of 25 million records, but only 2 columns. Currently buff size = 100 000.
As per my understanding,
DeleteIf you want to lookup on range of values, Then we have tintervalmatch Componet in talend
tIntervalMatch: using this component you can lookup on range of values(like between operation).
For this componet you need to specify
min value,maxvalue and lookup column
Thanks,
Raja K
Hi Patrick ,
ReplyDeleteCan you describe your scenario more detail ?
Other scenario Optimize Tmap:
If look-up table is very huge and main table is very small then we can do it other Way.
you can do the look-up for each row in your main table.
this will do not load your lookup table.
For Each Record in the main flow --> it will fire one select query on lookup table.
Steps:
* In Tamp setting set the property lookup model=Reload at each row(catche) insted of load once.
* Then set expr key and column (lookup condition columns)
Hi, your post is very useful for me,i have a question that, if we have 70 mill rows in the main table, and two lookup table have the 10 mill rows.the memory is 8gb. what settings will be best in this case? Thank you.
ReplyDeleteHi Liz,
ReplyDeleteYou need to
1.select the value for store temp data property to True for two 10 mill lookup tables.
2. you can optimize your job by changing
max buffer size(nb of row). Run your job by increasing the max buffer size by 1mill or 500K then you will find the optimized value for your memory.
Thank you Raja, i start to use strore temp files on disk, it works well but the only problem i faced is the temp file are not deleted automatically. the folder is huge now. Do you have any idea about it?
DeleteThank you!!
Hi BeardoraemoN,
ReplyDeleteYup, we have this problem in talend. We have some work around to resolve this. use tFiledelete component once you job is completed. This component have on option for deleting complete folder aslo.
steps :
1. provide specific path for "temp data directory path" property in tmap
2. take tFileDelete and connect this component to your job with on subjob ok (trigger flow) option.
3. Check deleteFolder option in tFileDelete Component.
then you folder will be deleted after completion of your job. Then you will get free space.
thanks,
Raja K
hi, raja,
DeleteThank you for the quick reply, have another question, if the path for store temp file is not existed. will it be created automatically?
Thank you!
Hi BeardoraemoN,
Deletethis temp file path is created autmatically with tamp component.
I have tested this working file in TDI 5.0.
thanks,
Raja K
Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging!
ReplyDeleteHadoop Training in Chennai
i have lookup with 4 cror and main flow is 6 lac, could you suggest the flow
ReplyDeletei have lookup with 4 cror and main flow is 6 lac, could you suggest the flow
ReplyDeleteNice blog very nice information you given thanks for sharing. I like to share few more information about Talend Training
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteHow can we optimise the tMap in big data batch?
ReplyDeleteMy job is producing the output when ran with a table of 2 million records.Everything works fine.
But when the same job is applied to a bigger table of 16+ million records of hbase table, the job runs fine but there is no output being produced at the output folder.
What can be the issue?
Is this a memory issue.