Skip to main content

Analysing ODI batch performance

I’ve been involved with some performance work around an ODI DWH load batch. The batch comprises well over 1000 tasks in ODI, and whilst the Operator console is not a bad interface, it’s not very easy to spot the areas consuming the most runtime.
Here’s a set of SQL statements to run against the ODI work repository tables to help you methodically find the steps of most interest for tuning efforts.

odi_04.sql — Session runtimes, including child sessions

First off is the most fancy – using hierarchical SQL, it returns all sessions and child sessions:

-- odi_04.sql
--
-- ODI sessions and child sessions - runtimes
--
--
select --level,
    --, parent_sess_no
    sess_no, 
  --Following column can be included if you want to see the root parent session name
    --CONNECT_BY_ROOT sess_name "Root session",
  --
  -- Remove the lpad if you don't want child sessions indented in the results
    lpad('> ',3*(level-1),'-') || sess_name "Name",
    TO_CHAR(SESS_BEG,'yyyy-mm-dd hh24:mi:ss') as "Session Start",
    TO_CHAR(SESS_END,'yyyy-mm-dd hh24:mi:ss') as "Session End",
    SESS_DUR,
    SESS_STATUS
from SNP_SESSION
-- You can determine how many levels to navigate: level 1 is the master sessions and no children, level 2 is the first layer of children, etc.
--where level <= 3
start with      parent_sess_no is null
-- Use a mixture of the following predicates to identify your batch within the ODI Work Repository, and/or part of the batch of interest
--              and sess_name like '%LOAD%'
--                and sess_status = 'D'
                  and sess_beg between to_date('1/11/2010 09:00','DD/MM/YYYY HH24:MI') and to_date('4/11/2010 18:00','DD/MM/YYYY HH24:MI')
connect by prior sess_no    = parent_sess_no
/
 This would return something similar to this:



It’s worth noting a couple of things:
  • “D” status means “Done”. There’s a table in the ODI repository that decodes statuses, although they’re normally obvious
  • Seconds is the total for the session, including child sessions. So in the example shown, “LOAD_FACT1_TRANS_BATCH” takes a total of 4510 seconds, which is comprised of tasks within the session (see below), plus the runtime of LOAD_FACT1_STUFF (64 seconds) and LOAD_FACT1_SALES (4443 seconds).
From this example, there are plenty of long-running steps, so let’s pick one I prepared earlier, session number 984170, “LOAD_FACT1_SALES”, which is a child session of “LOAD_FACT1_TRANS_BATCH” (session number 980170).

odi_06.sql — Session runtime, broken down by step

Using the session number determined by our analysis of the overall batch session runtimes, here’s a query to show the runtime for each step in the session. It is possible to combine this with the hierarchical SQL of above, but I personally found it resulted in too much data to sift though as well as increasing the chances of random stabbing at big numbers. By picking a session from the first query for deliberate further analysis the mind is focussed on it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- odi_06.sql
--
-- ODI session broken down by step, for a single session
--
-- 
--
  
select  s.sess_no "Session #",
    sl.nno as "Step Number",
    sess_name "Session Name",
    ss.step_name "Step Name",
  to_char(sl.step_beg,'yyyy-mm-dd hh24:mi:ss') "Step Start",
  to_char(sl.step_end,'yyyy-mm-dd hh24:mi:ss') "Step End",
    sl.step_dur "Step Dur (s)",
from SNP_SESSION S
    left outer join snp_sess_step ss
    on s.sess_no = ss.sess_no
    inner join SNP_STEP_LOG SL
    on ss.sess_no = sl.sess_no
    and ss.nno = sl.nno
where   s.sess_no = 984170
;
For our example session it would give us output something like this:

Example output from odi_06.sql

So now we can see that of a long-running load step, over 80% of the time is spent on the step “Int. Table2 Merge”, step number 3

odi_07.sql — Session runtime, broken down task, for a given step

Using the same session number as before, and step number 3 as identified above, let’s have a look at the individual tasks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- odi_07.sql
--
-- ODI session broken down by task, for a given session and step
--
-- 
--
  
select  s.sess_no as "Session #",
  sl.nno as "Step #",
    st.scen_task_no as "Task #",
    st.task_name1 || ' - ' || st.task_name2 || ' - ' || st.task_name3 "Task",
  to_char(stl.task_beg,'yyyy-mm-dd hh24:mi:ss') "Task Start",
    stl.task_dur,
    stl.nb_row
from SNP_SESSION S
    left outer join snp_sess_step ss
    on s.sess_no = ss.sess_no
    inner join SNP_STEP_LOG SL
    on ss.sess_no = sl.sess_no
    and ss.nno = sl.nno
    inner join SNP_SESS_TASK ST
    on SS.sess_no = st.sess_no
    and ss.nno = st.nno
    inner join SNP_SESS_TASK_LOG STL
    ON SL.SESS_NO = STL.SESS_NO
    and SL.nno = STL.nno
    and SL.nb_run = STL.nb_run
    and st.scen_task_no = stl.scen_task_no
where   s.sess_no = 984170
    and sl.nno = 3
;

Example output from odi_07.sql

So from here I’d be focussing on two things:
  • Of the long-running tasks, can they be tuned?
  • Is this whole step using the most efficient logic? Could some of the tasks be combined or removed entirely? Could the load be done in a better way?

More funky stuff

odi_08.sql – Identify task optimisation candidates

At first glance this is a quick-win for listing out the longest running tasks within a batch. And it is that. But, beware of taking a blinkered view of tasks in isolation. The advantage of using the queries above to drill down from overall batch runtime down through sessions, steps, and then to tasks, is that you have the context of the task. Still, this query that follows can be useful for a quick hit list of tasks to check that have been covered off by more detailed analysis.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- odi_08.sql
--
-- ODI task optimisation candidates
--
-- 
--
select DISTINCT
      --level
  st.task_name1 || ' - ' || st.task_name2 || ' - ' || st.task_name3 "Task",
    stl.task_dur,
  stl.nb_row,
  s.sess_no || '/' ||  sl.nno || '/' || stl.scen_task_no as "Session/Step/Task #"
  SYS_CONNECT_BY_PATH(s.sess_name, ' / ') || ' / ' ||   ss.step_name "Step Name"
from SNP_SESSION S
    left outer join snp_sess_step ss
    on s.sess_no = ss.sess_no
    inner join SNP_STEP_LOG SL
    on ss.sess_no = sl.sess_no
    and ss.nno = sl.nno
    inner join SNP_SESS_TASK ST
    on SS.sess_no = st.sess_no
    and ss.nno = st.nno
    inner join SNP_SESS_TASK_LOG STL
    ON SL.SESS_NO = STL.SESS_NO
    and SL.nno = STL.nno
    and SL.nb_run = STL.nb_run
    and st.scen_task_no = stl.scen_task_no
where stl.task_dur > &&min_duration_secs
and st.task_name3 != 'Run_Subscribed_Process_ID' -- Ignore parent tasks of child sessions
start with      parent_sess_no is null
        and sess_beg between to_date('1/11/2010 09:00','DD/MM/YYYY HH24:MI') and to_date('1/11/2010 18:00','DD/MM/YYYY HH24:MI')
connect by prior s.sess_no  = s.parent_sess_no
order by stl.task_dur desc
/

Example output from odi_08.sql

As can be seen clearly from this, there are several different types of task within ODI and not all with have a row count associated with them.

We can start doing more advanced analysis using this data. For example, of the tasks that do return row counts, what rows/sec throughput are we getting? Are there any steps where the throughput is abnormally low, and therefore a candidate for further examination? Dumping the output of odi_08.sql into Excel and adding a derived column rows/sec, and applying Colour Scales Conditional Formatting gives this:



Throughput analysis, using data from odi_08.sql


Looking at this and picking out two tasks that ran for about seven minutes, I’d be a lot more interested in “dummy task 13″ which processed just over 2 million rows in that time, compared to “dummy task 2″ which processed nearly seven times as many – 13 million. Now it may be one is doing a direct insert and the other’s doing some complicated merge logic, but it’s well worth checking before just heading for the biggest runtime numbers.

Comments

Popular posts from this blog

ODI KM Adding Order by Option

You can add Order by statement to queries by editing KM.I have edited IKM SQL Control Append to provide Order by.  1) Add an option to KM named USE_ORDER_BY, its type is Checkbox and default value is False. This option determines you want an order by statement at your query. 2)Add second option to KM named ORDER_BY, type is Text. You will get order by values to your query by this option. 3) Editing Insert New Rows detail of KM. Adding below three line code after having clause. That's it! <% if (odiRef.getOption("USE_ORDER_ BY").equals("1")) { %> ORDER BY <%=odiRef.getOption("ORDER_BY" )%> <%} %>  If USE_ORDER_BY option is not used, empty value of ORDER_BY option get error. And executions of KM appears as such below; At this execution, I checked the KM to not get errors if ORDER_BY option value is null. There is no prove of ORDER BY I'm glad.  Second execution to get  Ord

Creating Yellow Interface in ODI

Hello everyone! In Oracle data integrator (ODI), an  interface  is an object which populates one datastore, called the  target , with data coming from one or more other datastores, known as  sources . The fields of the source datastore are linked to those in the target datastore using the concept of  Mapping . Temporary interfaces used in ODI are popularly known as  Yellow Interfaces . It is because ODI generates a yellow icon at the time of creation of a yellow interface as opposed to the blue icon of a regular interface. The advantage of using a yellow interface is to avoid the creation of  Models each time you need to use it in an interface. Since they are temporary, they are not a part of the data model and hence don’t need to be in the Model. So let’s begin and start creating our yellow interface! Pre-requisites : Oracle 10g Express Edition with *SQL Plus, Oracle Data Integrator 11g. Open *SQL Plus and create a new table  Sales  in Oracle. You can use any existing ta

Running Count in Talend Open Studio

Most Talend components keep a count of the records processed using variables like NB_LINE or NB_LINE_OK.  But these are only available after all processing is completed.  Define your own counter variable to keep a running count for use in a tMap. Variables like tFilterRow.NB_LINE or tAccessOutput.NB_LINE_INSERTED can be used to report the number of affected lines after a subjob's processing.  However, it may be of use to get the current line index for use in a tMap.  The index variables used to form NB_LINE aren't available during processing; they're only written out the globalMap at the end of processing. In this example, staging records are loaded from Excel to Access.  The order in which the Excel records are read is preserved in a database column called DISPLAY_SEQ_NB.  Note that there is an auto-increment column used for record ID in the Access table.  This could be used to infer a loading order, but this job uses a separate column to keep the ID as a meaningless surr