About Talend

Talend Open Studio operates as a code generator allowing data transformation scripts and underlying programs to be generated either in Java (OR) Perl. Its GUI is made of a metadata repository and a graphical designer. The metadata repository contains the definitions and configuration for each job. The information in the metadata repository is used by all of the components of Talend Open Studio.

Course Details : http://talend-training.blogspot.in/2013/04/talend-training-course-details.html

How to Optimize Tmap Component in Talend

Optimize Tmap component in Talend


If Lookup table has few rows:

  • Open tmap select lookup model as Load Once .This will load lookup data one time before mail flow starts.
  • This Lookup data will be stored in memory then main flow execution is very fast with comparing with lookup data in memory.

IF lookup table data is very large:
Talend cannont store the lookup table data in memory. you will get java heap space Execption. To resolve this issue follow the below steps

  • open tmap
  • go to lookup table
  • click on tmap settings
  • select the value for store temp data property to True 
  • click on ok
  • In Tmap properties basic settings  set Temp Data Directory path by browsing folder
  • Go to Advance settings
  • set  max buffer size(nb of row) to some value based on  lookup condition data type

In my jobs lookup stressful for  2 mill records with 3 GB of memory  of integer data type.

15 comments:

  1. Any ideas on what "some value based on lookup condition data type" would be? Or what they should be based on.

    I have a lookup of 25 million records, but only 2 columns. Currently buff size = 100 000.

    ReplyDelete
    Replies
    1. As per my understanding,

      If you want to lookup on range of values, Then we have tintervalmatch Componet in talend

      tIntervalMatch: using this component you can lookup on range of values(like between operation).
      For this componet you need to specify
      min value,maxvalue and lookup column

      Thanks,
      Raja K

      Delete
  2. Hi Patrick ,

    Can you describe your scenario more detail ?

    Other scenario Optimize Tmap:

    If look-up table is very huge and main table is very small then we can do it other Way.

    you can do the look-up for each row in your main table.
    this will do not load your lookup table.
    For Each Record in the main flow --> it will fire one select query on lookup table.

    Steps:
    * In Tamp setting set the property lookup model=Reload at each row(catche) insted of load once.

    * Then set expr key and column (lookup condition columns)

    ReplyDelete
  3. Hi, your post is very useful for me,i have a question that, if we have 70 mill rows in the main table, and two lookup table have the 10 mill rows.the memory is 8gb. what settings will be best in this case? Thank you.

    ReplyDelete
  4. Hi Liz,

    You need to
    1.select the value for store temp data property to True for two 10 mill lookup tables.

    2. you can optimize your job by changing
    max buffer size(nb of row). Run your job by increasing the max buffer size by 1mill or 500K then you will find the optimized value for your memory.

    ReplyDelete
    Replies
    1. Thank you Raja, i start to use strore temp files on disk, it works well but the only problem i faced is the temp file are not deleted automatically. the folder is huge now. Do you have any idea about it?
      Thank you!!

      Delete
  5. Hi BeardoraemoN,

    Yup, we have this problem in talend. We have some work around to resolve this. use tFiledelete component once you job is completed. This component have on option for deleting complete folder aslo.

    steps :
    1. provide specific path for "temp data directory path" property in tmap
    2. take tFileDelete and connect this component to your job with on subjob ok (trigger flow) option.
    3. Check deleteFolder option in tFileDelete Component.

    then you folder will be deleted after completion of your job. Then you will get free space.

    thanks,
    Raja K


    ReplyDelete
    Replies
    1. hi, raja,
      Thank you for the quick reply, have another question, if the path for store temp file is not existed. will it be created automatically?
      Thank you!

      Delete
    2. Hi BeardoraemoN,

      this temp file path is created autmatically with tamp component.

      I have tested this working file in TDI 5.0.

      thanks,
      Raja K



      Delete
  6. Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging!


    Hadoop Training in Chennai

    ReplyDelete
  7. i have lookup with 4 cror and main flow is 6 lac, could you suggest the flow

    ReplyDelete
  8. i have lookup with 4 cror and main flow is 6 lac, could you suggest the flow

    ReplyDelete
  9. Nice blog very nice information you given thanks for sharing. I like to share few more information about Talend Training

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. How can we optimise the tMap in big data batch?
    My job is producing the output when ran with a table of 2 million records.Everything works fine.
    But when the same job is applied to a bigger table of 16+ million records of hbase table, the job runs fine but there is no output being produced at the output folder.
    What can be the issue?
    Is this a memory issue.

    ReplyDelete