Skip to main content

Calling a Stored Procedure from Talend Open Studio

Writing to a set of related tables may present a timing challenge to Talend Open Studio developers.  This post shows how to use a stored procedure to write to two related MS SQL Server tables.

Databases like MS SQL Server and MySQL use special identity columns to provide a unique auto-incremented value for a row.  This value is often used as a primary key.  Working with these columns presents a special challenge to Talend Open Studio developers because there may not be a business key that supports a lookup on the generated id.

Related Tables

The tables Business and Contact are releated by a foreign key 'businessId' which is the primary key of the Business table.  A Contact record can't be inserted without a valid foreign key.  'businessName' is not unique, so a lookup cannot be established after-the-fact.


Two Tables Related by an Auto Increment Primary Key
Get Last ID

Talend provides components for retrieving the last auto-generated id: tMSSqlLastInsertId, tMysqlLastInsertId.  These components are used in the middle of a sequence of output components.  For example, the following job shows a tMSSqlLastInsertId in between tMSSqlOutput components "Business" and "Contact".  Business writes to the Business table.  Contact writes to the Contact table.  tMSSqlLastInsertId adds a special 'last_insert_id' field in the processing flow which is used to carry the generated businessId into the INSERT statement for Contact.


Job Using last_insert_id
Pass-Through Schemas
  
When using tMSSqlOutput components, you'll generally use a schema from a repository which Talend Open Studio can keep in sync with the database.  This improves maintenance.  In the above job however, the schema used for Business needs to be expanded in order to carry the Contact fields over to the Contact component.


A Pass-Through Schema Carries Fields for Later Processing
Because the fields like Primary_Contact_FirstName aren't part of the Business table insert, they are excluded using the Advanced settings tab / Field options.  Note the unchecked "Use batch" which is required for the last_insert_id to be set.


Ignore Pass-Through Fields
With a Stored Procedure


The above example works, but the approach can become more difficult to maintain if there is another layer of dependency, say a load that will write to a dependent of Contact the requires the auto-generated contactId.  Stored procedures provide a way to encapsulate the call.  A single Talend component is used -- tMSSqlSP -- that is based on a composite schema of all the fields for all the tables, as was used in the Pass-through schema for Business in the previous example.


Such a job looks like the following with a single tMSSqlSP replacing the Business/last_insert_id/Contact processing chain.


Job Using a Stored Procedure for a Multitable Write
A composite schema containing all the fields for all the table inserts is used.  This set of fields matches the number of parameters for the stored procedure.


Schema Used in tMSSqlSP Procedure
And the corresponding configuration uses the following parameters.  Note the green double-plus button will take the schema and create a full set of parameters, one for each field in the schema.


Configuration of a tMSSqlSP Component


Stored Procedure Source

Like the original job, the stored procedure writes to both the Business and the Contact tables.  For the Contact write, a SELECT @@IDENTITY is used to set a variable.  This is the same SELECT executed by a tMSSqlLastInsertId component.


CREATE PROC dbo.InsertBusiness
  @businessName AS VARCHAR(50),
  @streetAddr1 AS VARCHAR(50),
  @streetAddr2 AS VARCHAR(50),
  @city AS VARCHAR(50),
  @usState AS VARCHAR(2),
  @zip AS VARCHAR(10),
  @phoneNumber AS VARCHAR(25),
  @faxNumber AS VARCHAR(25),
  @websiteURL AS VARCHAR(255),
  @firstName AS VARCHAR(50),
  @lastName AS VARCHAR(50),
  @contactPhoneNumber AS VARCHAR(25),
  @email AS VARCHAR(100)
AS
    DECLARE @businessId INT
   
    INSERT INTO Business (businessName, streetAddr1, streetAddr2, city,
    state, zip, phoneNumber, faxNumber, websiteURL)
    VALUES (
    @businessName, @streetAddr1, @streetAddr2, @city, @usState, @zip,
    @phoneNumber, @faxNumber, @websiteURL
    )
   
    SELECT @businessId = @@IDENTITY

    INSERT INTO Contact (businessId, firstName, lastName, phoneNumber, email)
    VALUES (
    @businessId, @firstName, @lastName, @contactPhoneNumber, @email
    )



Talend Open Studio's management of schemas is a big productivity boost.  Both examples presented in this post would benefit with a composite schema -- one that combines Business and Contact -- saved in the Repository.  The tMSSqlLastInsertId technique can be used reliably, but if there are more related writes, consider encapsulating the data with a stored procedure call.  Also, if performance is critical, you can move certain lookups into the database in a stored procedure where the retrieval is much faster.  Note that in both cases, the inserts are made row-by-row with the input.  This is because batching is turned off in the tMSSqlLastInsertId example, though the commit may be deferred.

Comments

Popular posts from this blog

ODI KM Adding Order by Option

You can add Order by statement to queries by editing KM.I have edited IKM SQL Control Append to provide Order by.  1) Add an option to KM named USE_ORDER_BY, its type is Checkbox and default value is False. This option determines you want an order by statement at your query. 2)Add second option to KM named ORDER_BY, type is Text. You will get order by values to your query by this option. 3) Editing Insert New Rows detail of KM. Adding below three line code after having clause. That's it! <% if (odiRef.getOption("USE_ORDER_ BY").equals("1")) { %> ORDER BY <%=odiRef.getOption("ORDER_BY" )%> <%} %>  If USE_ORDER_BY option is not used, empty value of ORDER_BY option get error. And executions of KM appears as such below; At this execution, I checked the KM to not get errors if ORDER_BY option value is null. There is no prove of ORDER BY I'm glad.  Second execution to get  Ord...

Synchronous and Asynchronous execution in ODI

In data warehouse designing, an important step is to deciding which step is before/after. Newly added packages and required DW data must be analyzed carefully. Synchronous addings can lengthen ETL duration. Interfaces, procedures without generated scenario cannot be executed in parallel. Only scenario executions can be parallel in ODI. Default scenario execution is synch in ODI. If you want to set a scenario to executed in parallel then you will write “-SYNC_MODE=2″ on command tab or select Synchronous / Asynchronous option Asynchronous in General tab. I have created a package as interfaces executes as; INT_JOBS parallel  INT_REGIONS synch  INT_REGIONS synch  INT_COUNTRIES synch  INT_LOCATIONS parallel  INT_EMPLOYEES parallel (Interfaces are independent.) Selecting beginning and ending times and durations from repository tables as ODI 11g operator is not calculating these values. It is obvious in ODI 10g operator. SELECT    sess_no...

Oracle Data Integrator tools: OdiFileDelete and OdiOutFile

Hello everyone! It’s time for another cool ODI tutorial. Last time, I spoke about the   OdiZip tool and how it can be used to create zip files from a directory. Through this post, I will talk about two more tools related to  Files  namely  OdiFileDelete and  OdiOutFile . 1. OdiFileDelete The  OdiFileDelete  is a tool used to delete files present in a directory or a complete directory on the machine running the agent. Usage OdiFileDelete -DIR=<dir> | -FILE=<file> [-RECURSE=<yes|no>] [-CASESENS=<yes|no>] [-NOFILE_ERROR=<yes|no>] [-FROMDATE=<fromdate>] [-TODATE=<todate>] If  -FROMDATE  is omitted, all files with a modification date earlier than the  -TODATE  date will be deleted. If  -TODATE  is omitted, all files with a modification date later than the  -FROMDATE  date will be deleted. If both parameters are omitted, all files matching the  -FILE...