Skip to main content

Running Count in Talend Open Studio

Most Talend components keep a count of the records processed using variables like NB_LINE or NB_LINE_OK.  But these are only available after all processing is completed.  Define your own counter variable to keep a running count for use in a tMap.

Variables like tFilterRow.NB_LINE or tAccessOutput.NB_LINE_INSERTED can be used to report the number of affected lines after a subjob's processing.  However, it may be of use to get the current line index for use in a tMap.  The index variables used to form NB_LINE aren't available during processing; they're only written out the globalMap at the end of processing.

In this example, staging records are loaded from Excel to Access.  The order in which the Excel records are read is preserved in a database column called DISPLAY_SEQ_NB.  Note that there is an auto-increment column used for record ID in the Access table.  This could be used to infer a loading order, but this job uses a separate column to keep the ID as a meaningless surrogate key to help with maintenance later.  (I can swap in a record at the same DISPLAY_SEQ_NB without having to work against the auto-incrementing mechanism.)

Talend Staging Job Using a Counter
Step 1: Define the Counter Variable

To define the counter variable, use a tSetGlobalVar.  Define a global with an initial value.  In this case, the job uses an unquoted 0 to set it as an Integer which will support an increment later.

tSetGlobalVar

Step2: Use the Variable

Use the variable in a tMap.  Retrieve the value using the globalMap and cast to the Integer type.

tMap Using budgetFileCounter Variable

Step 3: Increment the Counter

Use a tJavaRow to increment the counter.  First, use the "Generate Code" feature to pass the input fields directly to the output.  Next, add a line of Java code that unpacks the variable stored in the globalMap into a Java primitive type that can be manipulated.

Incrementing budgetFileCounter Variable
 Most component report the outcome of their processing using CID-named global variables like 'tFilterRow.NB_LINES_OK'.  However, these variables are only available after the processing has been completed.  If you want to keep a running count, set your own variable.

Comments

Popular posts from this blog

ODI KM Adding Order by Option

You can add Order by statement to queries by editing KM.I have edited IKM SQL Control Append to provide Order by.  1) Add an option to KM named USE_ORDER_BY, its type is Checkbox and default value is False. This option determines you want an order by statement at your query. 2)Add second option to KM named ORDER_BY, type is Text. You will get order by values to your query by this option. 3) Editing Insert New Rows detail of KM. Adding below three line code after having clause. That's it! <% if (odiRef.getOption("USE_ORDER_ BY").equals("1")) { %> ORDER BY <%=odiRef.getOption("ORDER_BY" )%> <%} %>  If USE_ORDER_BY option is not used, empty value of ORDER_BY option get error. And executions of KM appears as such below; At this execution, I checked the KM to not get errors if ORDER_BY option value is null. There is no prove of ORDER BY I'm glad.  Second execution to get  Ord...

Synchronous and Asynchronous execution in ODI

In data warehouse designing, an important step is to deciding which step is before/after. Newly added packages and required DW data must be analyzed carefully. Synchronous addings can lengthen ETL duration. Interfaces, procedures without generated scenario cannot be executed in parallel. Only scenario executions can be parallel in ODI. Default scenario execution is synch in ODI. If you want to set a scenario to executed in parallel then you will write “-SYNC_MODE=2″ on command tab or select Synchronous / Asynchronous option Asynchronous in General tab. I have created a package as interfaces executes as; INT_JOBS parallel  INT_REGIONS synch  INT_REGIONS synch  INT_COUNTRIES synch  INT_LOCATIONS parallel  INT_EMPLOYEES parallel (Interfaces are independent.) Selecting beginning and ending times and durations from repository tables as ODI 11g operator is not calculating these values. It is obvious in ODI 10g operator. SELECT    sess_no...

Oracle Data Integrator tools: OdiFileDelete and OdiOutFile

Hello everyone! It’s time for another cool ODI tutorial. Last time, I spoke about the   OdiZip tool and how it can be used to create zip files from a directory. Through this post, I will talk about two more tools related to  Files  namely  OdiFileDelete and  OdiOutFile . 1. OdiFileDelete The  OdiFileDelete  is a tool used to delete files present in a directory or a complete directory on the machine running the agent. Usage OdiFileDelete -DIR=<dir> | -FILE=<file> [-RECURSE=<yes|no>] [-CASESENS=<yes|no>] [-NOFILE_ERROR=<yes|no>] [-FROMDATE=<fromdate>] [-TODATE=<todate>] If  -FROMDATE  is omitted, all files with a modification date earlier than the  -TODATE  date will be deleted. If  -TODATE  is omitted, all files with a modification date later than the  -FROMDATE  date will be deleted. If both parameters are omitted, all files matching the  -FILE...