Skip to main content

Calling a Stored Procedure from Talend Open Studio

Writing to a set of related tables may present a timing challenge to Talend Open Studio developers.  This post shows how to use a stored procedure to write to two related MS SQL Server tables.

Databases like MS SQL Server and MySQL use special identity columns to provide a unique auto-incremented value for a row.  This value is often used as a primary key.  Working with these columns presents a special challenge to Talend Open Studio developers because there may not be a business key that supports a lookup on the generated id.

Related Tables

The tables Business and Contact are releated by a foreign key 'businessId' which is the primary key of the Business table.  A Contact record can't be inserted without a valid foreign key.  'businessName' is not unique, so a lookup cannot be established after-the-fact.


Two Tables Related by an Auto Increment Primary Key
Get Last ID

Talend provides components for retrieving the last auto-generated id: tMSSqlLastInsertId, tMysqlLastInsertId.  These components are used in the middle of a sequence of output components.  For example, the following job shows a tMSSqlLastInsertId in between tMSSqlOutput components "Business" and "Contact".  Business writes to the Business table.  Contact writes to the Contact table.  tMSSqlLastInsertId adds a special 'last_insert_id' field in the processing flow which is used to carry the generated businessId into the INSERT statement for Contact.


Job Using last_insert_id
Pass-Through Schemas
  
When using tMSSqlOutput components, you'll generally use a schema from a repository which Talend Open Studio can keep in sync with the database.  This improves maintenance.  In the above job however, the schema used for Business needs to be expanded in order to carry the Contact fields over to the Contact component.


A Pass-Through Schema Carries Fields for Later Processing
Because the fields like Primary_Contact_FirstName aren't part of the Business table insert, they are excluded using the Advanced settings tab / Field options.  Note the unchecked "Use batch" which is required for the last_insert_id to be set.


Ignore Pass-Through Fields
With a Stored Procedure


The above example works, but the approach can become more difficult to maintain if there is another layer of dependency, say a load that will write to a dependent of Contact the requires the auto-generated contactId.  Stored procedures provide a way to encapsulate the call.  A single Talend component is used -- tMSSqlSP -- that is based on a composite schema of all the fields for all the tables, as was used in the Pass-through schema for Business in the previous example.


Such a job looks like the following with a single tMSSqlSP replacing the Business/last_insert_id/Contact processing chain.


Job Using a Stored Procedure for a Multitable Write
A composite schema containing all the fields for all the table inserts is used.  This set of fields matches the number of parameters for the stored procedure.


Schema Used in tMSSqlSP Procedure
And the corresponding configuration uses the following parameters.  Note the green double-plus button will take the schema and create a full set of parameters, one for each field in the schema.


Configuration of a tMSSqlSP Component


Stored Procedure Source

Like the original job, the stored procedure writes to both the Business and the Contact tables.  For the Contact write, a SELECT @@IDENTITY is used to set a variable.  This is the same SELECT executed by a tMSSqlLastInsertId component.


CREATE PROC dbo.InsertBusiness
  @businessName AS VARCHAR(50),
  @streetAddr1 AS VARCHAR(50),
  @streetAddr2 AS VARCHAR(50),
  @city AS VARCHAR(50),
  @usState AS VARCHAR(2),
  @zip AS VARCHAR(10),
  @phoneNumber AS VARCHAR(25),
  @faxNumber AS VARCHAR(25),
  @websiteURL AS VARCHAR(255),
  @firstName AS VARCHAR(50),
  @lastName AS VARCHAR(50),
  @contactPhoneNumber AS VARCHAR(25),
  @email AS VARCHAR(100)
AS
    DECLARE @businessId INT
   
    INSERT INTO Business (businessName, streetAddr1, streetAddr2, city,
    state, zip, phoneNumber, faxNumber, websiteURL)
    VALUES (
    @businessName, @streetAddr1, @streetAddr2, @city, @usState, @zip,
    @phoneNumber, @faxNumber, @websiteURL
    )
   
    SELECT @businessId = @@IDENTITY

    INSERT INTO Contact (businessId, firstName, lastName, phoneNumber, email)
    VALUES (
    @businessId, @firstName, @lastName, @contactPhoneNumber, @email
    )



Talend Open Studio's management of schemas is a big productivity boost.  Both examples presented in this post would benefit with a composite schema -- one that combines Business and Contact -- saved in the Repository.  The tMSSqlLastInsertId technique can be used reliably, but if there are more related writes, consider encapsulating the data with a stored procedure call.  Also, if performance is critical, you can move certain lookups into the database in a stored procedure where the retrieval is much faster.  Note that in both cases, the inserts are made row-by-row with the input.  This is because batching is turned off in the tMSSqlLastInsertId example, though the commit may be deferred.

Comments

Popular posts from this blog

ODI KM Adding Order by Option

You can add Order by statement to queries by editing KM.I have edited IKM SQL Control Append to provide Order by.  1) Add an option to KM named USE_ORDER_BY, its type is Checkbox and default value is False. This option determines you want an order by statement at your query. 2)Add second option to KM named ORDER_BY, type is Text. You will get order by values to your query by this option. 3) Editing Insert New Rows detail of KM. Adding below three line code after having clause. That's it! <% if (odiRef.getOption("USE_ORDER_ BY").equals("1")) { %> ORDER BY <%=odiRef.getOption("ORDER_BY" )%> <%} %>  If USE_ORDER_BY option is not used, empty value of ORDER_BY option get error. And executions of KM appears as such below; At this execution, I checked the KM to not get errors if ORDER_BY option value is null. There is no prove of ORDER BY I'm glad.  Second execution to get  Ord...

Creating Yellow Interface in ODI

Hello everyone! In Oracle data integrator (ODI), an  interface  is an object which populates one datastore, called the  target , with data coming from one or more other datastores, known as  sources . The fields of the source datastore are linked to those in the target datastore using the concept of  Mapping . Temporary interfaces used in ODI are popularly known as  Yellow Interfaces . It is because ODI generates a yellow icon at the time of creation of a yellow interface as opposed to the blue icon of a regular interface. The advantage of using a yellow interface is to avoid the creation of  Models each time you need to use it in an interface. Since they are temporary, they are not a part of the data model and hence don’t need to be in the Model. So let’s begin and start creating our yellow interface! Pre-requisites : Oracle 10g Express Edition with *SQL Plus, Oracle Data Integrator 11g. Open *SQL Plus and create a new table  Sales ...

Synchronous and Asynchronous execution in ODI

In data warehouse designing, an important step is to deciding which step is before/after. Newly added packages and required DW data must be analyzed carefully. Synchronous addings can lengthen ETL duration. Interfaces, procedures without generated scenario cannot be executed in parallel. Only scenario executions can be parallel in ODI. Default scenario execution is synch in ODI. If you want to set a scenario to executed in parallel then you will write “-SYNC_MODE=2″ on command tab or select Synchronous / Asynchronous option Asynchronous in General tab. I have created a package as interfaces executes as; INT_JOBS parallel  INT_REGIONS synch  INT_REGIONS synch  INT_COUNTRIES synch  INT_LOCATIONS parallel  INT_EMPLOYEES parallel (Interfaces are independent.) Selecting beginning and ending times and durations from repository tables as ODI 11g operator is not calculating these values. It is obvious in ODI 10g operator. SELECT    sess_no...