Skip to main content

Three Error Handling Strategies in Talend Open Studio

You can recover from some errors.  Others, like system or network failures are fatal.  But even in the fatal case, your Talend Open Studio job should die gracefully, notifying the operations team and leaving the data in a good state.  This post presents three error handling strategies for your Talend jobs.

Some Talend Open Studio job errors are alternate paths that, though infrequent, occur often enough to justify special programming. This programming may come in the form of guard conditions, special logic applied to route the special case to another subjob.  For an example of these type of errors, see this blog post on ETL Filter Patterns.

Other errors are related to system and network activity or are bugs.  There are a few ways to handle this class of error in Talend Open Studio.

Do Nothing

For simple jobs, say an automated administrative task, you can rely on the exception throwing of Talend Open Studio.  An example is a simple input to output job where a database failure in writing the output results in a system error.  This is expressed in the Run View as a red stack trace.

Simple Job with No Extra Error Handling Configured 
Subjob or Component Error Triggers

Each subjob and component has a return code that can drive additional processing.  The Subjob Ok/Error and Component Ok/Error can be used to steer the error toward an error handling routine like the tSendMail component.  This example looks for a connection error (the database is off) or a file processing error (the database is on, but the table name is wrong).

Both an individual subjob and a finer-grain component can be tested.  The screenshot shows two tSendMail routines being called from an OnSubjobError trigger.

Error Handling Tailored to the Subjob (or Component)
While testing the individual subjobs and components has the advantage of providing error handling tied to the specific case, there are disadvantages in maintenance and testing.  Maintenance suffers because the job  becomes cluttered with extra components which can confuse the normal processing, less frequent processing, and the error handling.  Testing is harder because there are more test cases.

Sometimes, there is a need for this level of detail.  You may want to send a file that represents an intermediate stage of processing via email.  This file isn't available throughout the job, and not every failure can handle this.

tAssertCatcher

A more general strategy is to define an error handling subjob to be performed when an error -- any error -- occurs.  This has the important advantage of consolidating the error handling, dramatically reducing testing.  It puts the burden of testing for error conditions on Talend (where it belongs).

To implement the general strategy, use the tAssertCatcher component which will be invoked whenever any component throws an error.

A Shared Error Handler with tAssertCatcher

If there's a failure in the XSL component (tXSLT) or other component resulting in a Java exception, the job will continue with the error handler (in this case a tLogRow) attached to the tAssertCatcher. tAssertCatcher can route an error message to other handlers like a tSendmail.

tAssertCatcher Config
Components like tXSLT don't need any additional configuration to use tAssertCatcher.  ThetFileInputXML has a "Die on error" checkbox that needs to be set.

In the following screenshot, the database component tMSSqlOutput_1 has "Die on error" set.  If the flag is not set, then the tMSSqlOutput will print a message and the tAssertCatcher will not be called.  This particular example caught errors from the connection component (bad login) and the tMSSqlOutput component (DB-generated unique constraint violation and invalid insert of identity column).

An Example with Database Components


Let Talend Work

Handling system errors is different than alternate paths and conditions that arise during coding a Talend job.  Sometimes, you'll have a specific error routine for a specific system error condition.  But where possible, let Talend throw the system errors and catch them with a tAssertCatcher. 

Comments

  1. Copycat from http://bekwam.blogspot.fr/2011/04/three-error-handling-strategies-in.html !!

    ReplyDelete

Post a Comment

Popular posts from this blog

ODI KM Adding Order by Option

You can add Order by statement to queries by editing KM.I have edited IKM SQL Control Append to provide Order by.  1) Add an option to KM named USE_ORDER_BY, its type is Checkbox and default value is False. This option determines you want an order by statement at your query. 2)Add second option to KM named ORDER_BY, type is Text. You will get order by values to your query by this option. 3) Editing Insert New Rows detail of KM. Adding below three line code after having clause. That's it! <% if (odiRef.getOption("USE_ORDER_ BY").equals("1")) { %> ORDER BY <%=odiRef.getOption("ORDER_BY" )%> <%} %>  If USE_ORDER_BY option is not used, empty value of ORDER_BY option get error. And executions of KM appears as such below; At this execution, I checked the KM to not get errors if ORDER_BY option value is null. There is no prove of ORDER BY I'm glad.  Second execution to get  Ord...

Synchronous and Asynchronous execution in ODI

In data warehouse designing, an important step is to deciding which step is before/after. Newly added packages and required DW data must be analyzed carefully. Synchronous addings can lengthen ETL duration. Interfaces, procedures without generated scenario cannot be executed in parallel. Only scenario executions can be parallel in ODI. Default scenario execution is synch in ODI. If you want to set a scenario to executed in parallel then you will write “-SYNC_MODE=2″ on command tab or select Synchronous / Asynchronous option Asynchronous in General tab. I have created a package as interfaces executes as; INT_JOBS parallel  INT_REGIONS synch  INT_REGIONS synch  INT_COUNTRIES synch  INT_LOCATIONS parallel  INT_EMPLOYEES parallel (Interfaces are independent.) Selecting beginning and ending times and durations from repository tables as ODI 11g operator is not calculating these values. It is obvious in ODI 10g operator. SELECT    sess_no...

Oracle Data Integrator tools: OdiFileDelete and OdiOutFile

Hello everyone! It’s time for another cool ODI tutorial. Last time, I spoke about the   OdiZip tool and how it can be used to create zip files from a directory. Through this post, I will talk about two more tools related to  Files  namely  OdiFileDelete and  OdiOutFile . 1. OdiFileDelete The  OdiFileDelete  is a tool used to delete files present in a directory or a complete directory on the machine running the agent. Usage OdiFileDelete -DIR=<dir> | -FILE=<file> [-RECURSE=<yes|no>] [-CASESENS=<yes|no>] [-NOFILE_ERROR=<yes|no>] [-FROMDATE=<fromdate>] [-TODATE=<todate>] If  -FROMDATE  is omitted, all files with a modification date earlier than the  -TODATE  date will be deleted. If  -TODATE  is omitted, all files with a modification date later than the  -FROMDATE  date will be deleted. If both parameters are omitted, all files matching the  -FILE...