SCD Types
1) Overwrite old data with new
2) Adding a new row with surrogate key
3) Adding columns represents first, last and previous data
Other: If data is big, you can take monthly snapshots to save difference. Data is smaller and you want to track multiple columns, then you can take daily snapshots.
SCD type 2 is mostly prefered as keeps old versions. So let's roll on ODI for SCD type 2.
Our example will be sh.channels table comes default on Oracle db;
CREATE TABLE CHANNELS
(
CHANNEL_ID NUMBER NOT NULL,
CHANNEL_DESC VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS_ID NUMBER NOT NULL,
CHANNEL_TOTAL VARCHAR2(13 BYTE) NOT NULL,
CHANNEL_TOTAL_ID NUMBER NOT NULL
);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
3, 'Direct Sales', 'Direct', 12, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
9, 'Tele Sales', 'Direct', 12, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
5, 'Catalog', 'Indirect', 13, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
4, 'Internet', 'Indirect', 13, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
2, 'Partners', 'Others', 14, 'Channel total', 1);
COMMIT;
Channels_hist table is created for SCD;
CREATE TABLE CHANNELS_HIST
(
CHANNEL_ID NUMBER NOT NULL,
CHANNEL_DESC VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS_ID NUMBER NOT NULL,
CHANNEL_TOTAL VARCHAR2(13 BYTE) NOT NULL,
CHANNEL_TOTAL_ID NUMBER NOT NULL,
ETL_START_DATE DATE,
ETL_END_DATE DATE,
CURRENT_FLAG NUMBER
);
On the side window, scrool to Channel_hist model, expand model columns and double-click to channel_id, go to description tab and select Slowly Changing Behavior:Natural Key option.
Slowly Changing Behavior: Add row on change option is selected for CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL, CHANNEL_TOTAL_ID columns.
Following Fields and Selected Slowly Changing Behaviors:
ETL_START_DATE - Starting timestamp,
ETL_END_DATE - Ending timestamp,
CURRENT_FLAG - Current record flag
IKM Slowly Changing Dimension is selected on flow tab of interface INT_CHANNELS_HIST.
When interface is executed, etl_end_date field gets 01/01/2400 as hard-coded in KM even I mapped it to to_date('29990101', 'YYYYMMDD').
First version of Channels_hist table is right below:
select * from CHANNELS_HIST
CHANNEL_ID CHANNEL_DESC CHANNEL_CLASS CHANNEL_CLASS_ID CHANNEL_TOTAL
CHANNEL_TOTAL_ID ETL_START_DATE ETL_END_DATE CURRENT_FLAG
3 Direct Sales Direct 12 Channel total 1 18.10.2011 21:53:17 1
9 Tele Sales Direct 12 Channel total 1 18.10.2011 21:53:17 1
5 Catalog Indirect 13 Channel total 1 18.10.2011 21:53:17 1
4 Internet Indirect 13 Channel total 1 18.10.2011 21:53:17 1
2 Partners Others 14 Channel total 1 18.10.2011 21:53:17 1
And now I will update data:
UPDATE CHANNELS
SET CHANNEL_DESC='Telephone'
where channel_id=9;
UPDATE CHANNELS
SET CHANNEL_DESC='Direct'
where channel_id=3;
commit;
ODI generates two new records holds new data after Re-executing the interface. Old versions flagged as current_flag=0. ETL_END_DATE updated to 18.10.2011 23:32:07. And new data shows us updated versions as below:
CHANNEL_ID CHANNEL_DESC CHANNEL_CLASS CHANNEL_CLASS_ID CHANNEL_TOTAL CHANNEL_TOTAL_ID ETL_START_DATE ETL_END_DATE CURRENT_FLAG
--Old records
3 Direct SalesDirect 12 Channel total 1 18.10.2011 23:11:30 18.10.2011 23:32:07 0
9 Tele Sales Direct 12 Channel total 1 18.10.2011 23:11:30 18.10.2011 23:32:07 0
--New records
3 Direct Direct 12 Channel total 1 18.10.2011 23:32:07 01.01.2400 1
9 Telephone Direct 12 Channel total 1 18.10.2011 23:32:07 01.01.2400 1
Instead of etl_start_date and etl_end_date fields, we can use insert and update dates of source table if existed. But we have created our insert-update dates in SCD system.
Actual, last, current records can be selected as below:
Now one more test, add a new record:
Re-execute the interface and select new record with channel_id.
Second more test :D
Delete the new record and see how is the change on SCD.
Nothing changed!!!
Explanation on IKM Slowly changing dimension definition mentions about this restriction:
1) Overwrite old data with new
2) Adding a new row with surrogate key
3) Adding columns represents first, last and previous data
Other: If data is big, you can take monthly snapshots to save difference. Data is smaller and you want to track multiple columns, then you can take daily snapshots.
SCD type 2 is mostly prefered as keeps old versions. So let's roll on ODI for SCD type 2.
Our example will be sh.channels table comes default on Oracle db;
CREATE TABLE CHANNELS
(
CHANNEL_ID NUMBER NOT NULL,
CHANNEL_DESC VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS_ID NUMBER NOT NULL,
CHANNEL_TOTAL VARCHAR2(13 BYTE) NOT NULL,
CHANNEL_TOTAL_ID NUMBER NOT NULL
);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
3, 'Direct Sales', 'Direct', 12, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
9, 'Tele Sales', 'Direct', 12, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
5, 'Catalog', 'Indirect', 13, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
4, 'Internet', 'Indirect', 13, 'Channel total', 1);
INSERT INTO SH.CHANNELS ( CHANNEL_ID, CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL,
CHANNEL_TOTAL_ID ) VALUES (
2, 'Partners', 'Others', 14, 'Channel total', 1);
COMMIT;
Channels_hist table is created for SCD;
CREATE TABLE CHANNELS_HIST
(
CHANNEL_ID NUMBER NOT NULL,
CHANNEL_DESC VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS VARCHAR2(20 BYTE) NOT NULL,
CHANNEL_CLASS_ID NUMBER NOT NULL,
CHANNEL_TOTAL VARCHAR2(13 BYTE) NOT NULL,
CHANNEL_TOTAL_ID NUMBER NOT NULL,
ETL_START_DATE DATE,
ETL_END_DATE DATE,
CURRENT_FLAG NUMBER
);
After table is created, switch to ODI and open model tab, reverse channels_hist table, and select OLAP type: Slowly changing dimension option checked.
On the side window, scrool to Channel_hist model, expand model columns and double-click to channel_id, go to description tab and select Slowly Changing Behavior:Natural Key option.
Slowly Changing Behavior: Add row on change option is selected for CHANNEL_DESC, CHANNEL_CLASS, CHANNEL_CLASS_ID, CHANNEL_TOTAL, CHANNEL_TOTAL_ID columns.
Following Fields and Selected Slowly Changing Behaviors:
ETL_START_DATE - Starting timestamp,
ETL_END_DATE - Ending timestamp,
CURRENT_FLAG - Current record flag
IKM Slowly Changing Dimension is selected on flow tab of interface INT_CHANNELS_HIST.
When interface is executed, etl_end_date field gets 01/01/2400 as hard-coded in KM even I mapped it to to_date('29990101', 'YYYYMMDD').
First version of Channels_hist table is right below:
select * from CHANNELS_HIST
CHANNEL_ID CHANNEL_DESC CHANNEL_CLASS CHANNEL_CLASS_ID CHANNEL_TOTAL
CHANNEL_TOTAL_ID ETL_START_DATE ETL_END_DATE CURRENT_FLAG
3 Direct Sales Direct 12 Channel total 1 18.10.2011 21:53:17 1
9 Tele Sales Direct 12 Channel total 1 18.10.2011 21:53:17 1
5 Catalog Indirect 13 Channel total 1 18.10.2011 21:53:17 1
4 Internet Indirect 13 Channel total 1 18.10.2011 21:53:17 1
2 Partners Others 14 Channel total 1 18.10.2011 21:53:17 1
And now I will update data:
UPDATE CHANNELS
SET CHANNEL_DESC='Telephone'
where channel_id=9;
UPDATE CHANNELS
SET CHANNEL_DESC='Direct'
where channel_id=3;
commit;
ODI generates two new records holds new data after Re-executing the interface. Old versions flagged as current_flag=0. ETL_END_DATE updated to 18.10.2011 23:32:07. And new data shows us updated versions as below:
CHANNEL_ID CHANNEL_DESC CHANNEL_CLASS CHANNEL_CLASS_ID CHANNEL_TOTAL CHANNEL_TOTAL_ID ETL_START_DATE ETL_END_DATE CURRENT_FLAG
--Old records
3 Direct SalesDirect 12 Channel total 1 18.10.2011 23:11:30 18.10.2011 23:32:07 0
9 Tele Sales Direct 12 Channel total 1 18.10.2011 23:11:30 18.10.2011 23:32:07 0
--New records
3 Direct Direct 12 Channel total 1 18.10.2011 23:32:07 01.01.2400 1
9 Telephone Direct 12 Channel total 1 18.10.2011 23:32:07 01.01.2400 1
select * from CHANNELS_HIST
where current_flag=1
insert into sh.channels
values(10, 'Brochure', 'Indirect', 13, 'Channel total', 1);
commit;
select * from channels_hist
where channel_id=10
CHANNEL_ID CHANNEL_DESC CHANNEL_CLASS CHANNEL_CLASS_ID CHANNEL_TOTAL CHANNEL_TOTAL_ID ETL_START_DATE ETL_END_DATE CURRENT_FLAG
10 Brochure Indirect 13 Channel total 1 18.10.2011 23:45:11 01.01.2400 1
Second more test :D
Delete the new record and see how is the change on SCD.
delete from channels
where channel_id=10;
commit;
Nothing changed!!!
Explanation on IKM Slowly changing dimension definition mentions about this restriction:
- As this Knowledge Module is dedicated to Datawarehouse Projects (where data should not be removed), when using it with a journalized Source Datastore, it is not possible to synchronize deletions. Therefore, data should be manually filtered to exclude any source delete events (add a filter on JRN_FLAG = 'I').
Hope this will be helpfull. See ya! : D
Comments
Post a Comment