Imple­ment­ing Slowly Chan­ging Dimen­sions Type 2 in a Data Ware­house with ODI



By default, a trans­ac­tional sys­tem can alter inform­a­tion, but only keeps track of the most recent ver­sion of the data that has been mod­i­fied. To pre­serve a com­plete his­tory of all modi­fic­a­tions and to ana­lyse the spe­cific value of a piece of data at any given time, we can imple­ment Slowly Chan­ging Dimen­sion (SCD) meth­od­o­lo­gies in our data ware­house. This gives us a his­tor­ical record of how the value of a field in a dimen­sion is chan­ging over time, either occa­sion­ally or constantly.

There are 6 types of SCD:

SCD Type 1: Overwrite.

SCD Type 2: Add row.

SCD Type 3: Add column.

SCD Type 4: Sep­ar­ate history.

SCD Type 5: Add mini dimension.

SCD Type 6: Hybrid.

In this art­icle today we’ll focus on an SCD Type 2 strategy to store the his­tory of changes, and see how can this be imple­men­ted in Oracle Data Integ­rator (ODI) using its spe­cific built-in func­tion­al­it­ies for DWH integ­ra­tion scenarios.

How SCDs Work

SCD tables are spe­cific­ally for man­aging and main­tain­ing the his­tory of changes in data over time. Type 2 dimen­sions are com­monly used to keep track of vari­ations in the attrib­utes of a dimen­sion, so let’s see how they work and look at an approach to cre­ate an SCD Type 2 table.

How SCD Type 2 Tables Work

Imple­ment­ing SCD Type 2 involves using tech­niques like ver­sion­ing and effect­ive date ranges. The object­ive of a SCD Type 2 is to track his­tor­ical changes in dimen­sional data over time, while ensur­ing the integ­rity of the exist­ing data is maintained.

Ver­sion Registration

In an SCD Type 2 table, each row rep­res­ents a dis­tinct ver­sion of a dimen­sion mem­ber. A new row is added whenever there is a change in the dimen­sion attrib­utes, ensur­ing that the his­tor­ical data is preserved.

Effect­ive and Expir­a­tion Attributes

An SCD Type 2 table includes effect­ive attrib­utes and expir­a­tion inform­a­tion. The effect­ive attrib­utes spe­cify when a dimen­sion mem­ber becomes valid and is in use, while the expir­a­tion inform­a­tion indic­ates the date when a par­tic­u­lar ver­sion ceases to be active.

Change Man­age­ment

When a dimen­sion attrib­ute changes, a new row is inser­ted with the updated val­ues and a cor­res­pond­ing timestamp. The pre­vi­ous row is then “closed” by set­ting its expir­a­tion date to the moment just before the change occurred.

Strategy for Cre­at­ing an SCD Type 2 Table
Identi­fy­ing Changes 

This step involves determ­in­ing which dimen­sion attrib­utes need to be tracked and defin­ing the con­di­tions under which changes should be recorded.

Table Design

Define the table struc­ture, mak­ing sure to include fields for the nat­ural key, effect­ive attrib­utes, expir­a­tion date, and any other attrib­utes that need to be tracked.

Con­trol Fields

Add con­trol fields such as Start Date and End Date to man­age the tem­poral valid­ity of each version.

Act­ive Flag 

The act­ive flag indic­ates whether a row ver­sion of the record is cur­rently act­ive. If the record is cur­rent, the flag will be set to 1 or Y; if not, it will be set to 0 or N. Whenever there’s a change in the record’s val­ues, a new row is added, and the act­ive flag, along with other rel­ev­ant columns, must be updated to reflect the change history.

Below you can see an example of SCD func­tion­al­ity, where we’re con­sid­er­ing an employee whose status has been changed to “Removed”:

table with employee ID and status Removed

Let’s assume that, cur­rently, there is only one record for this employee in the SCD, reflect­ing their his­tor­ical status as “Removed”.

How­ever, dur­ing the next data load, the employee’s status changes to “Con­trac­ted”:

table with employee ID and status Contracted

After the data load, the SCD will con­tain a new record, show­ing the employee’s updated status and the effect­ive date of the change.

By default, the cur­rently valid employee status (with Act­ive Flag: Y) will have a valid­ity date of 31/12/4712; this is the default set­ting in Oracle. When the employee’s status changes and a new record is cre­ated, the Date To field will reflect the date of the load:

table with employee ID and status Removed and status Contracted

To sum­mar­ise, an SCD Type 2 design should be used when a Fact table includes peri­ods or dates as part of its struc­ture and requires accur­ate track­ing of the his­tor­ical ver­sions of a dimension’s attrib­utes. This approach ensures that each row in the Fact table points to the cor­rect ver­sion of the dimen­sion, based on the date of the fact and the dimension’s effect­ive dates.

Imple­ment­ing SCD Type 2 in ODI

The key steps for the imple­ment­a­tion of an SCD in ODI are as follows:

  1. Access ODI Studio.
  2. Cre­ate a new Data Model to bring data­base table metadata into ODI.
  3. Con­fig­ure the table as an SCD.
  4. Con­fig­ure the SCD attributes.
  5. Cre­ate a new Mapping.
  6. Con­fig­ure Logical and Phys­ical prop­er­ties in mapping.
  7. Test the Map­ping and val­id­ate it.
  8. Deploy and main­tain the Mapping.
Step-By-Step Guide on How To Cre­ate and Con­fig­ure A Slowly Chan­ging Dimension

1. Access ODI Studio

Open Oracle Data Integ­rator to begin design and development.

2. Data Model Creation

Cre­ate a Data Model that reflects the struc­ture of the SCD table. As the table has already been cre­ated in the data­base, we will reverse-engin­eer it to load into ODI within the desired Data Model:

Reverse Engineer the table in Oracle Data Integrator

3. Con­fig­ure the Data Model as SCD

Once the table has been loaded into the Data Model, open it to pro­ceed with the con­fig­ur­a­tion. Within the Data Model, loc­ate your dimen­sion and open it with a double-click; this will open a win­dow like the one shown below. In this win­dow, nav­ig­ate to the Defin­i­tion tab and set the OLAP Type to Slowly Chan­ging Dimen­sion:

Configure the Data Model as SCD

4. Con­fig­ure SCD attrib­utes (in Data Model)

In the Attrib­utes tab in the data­store, con­fig­ure each column of the table with one of the SCD Beha­vior options:

Configure SCD attributes (in Data Model)

Sur­rog­ate Key (): Select this option if your dimen­sion uses a sur­rog­ate key. A sur­rog­ate key is an arti­fi­cially gen­er­ated key cre­ated dur­ing the ETL pro­cess for integ­ra­tion pur­poses. We highly recom­mend using sur­rog­ate keys in all data ware­house scen­arios due to their many bene­fits. For example, sur­rog­ate keys are essen­tial when your tar­get records lack a dir­ect nat­ural key that guar­an­tees unique­ness. This can occur in situ­ations such as a Per­son table (where two indi­vidu­als might share the same name and birth­d­ate) or in log records (where two events could occur sim­ul­tan­eously, with the same timestamp). Sur­rog­ate keys are also cru­cial when integ­rat­ing data from mul­tiple source sys­tems or instances.

Nat­ural Key (man­dat­ory): The nat­ural key is a unique key in your table that uniquely iden­ti­fies a record. Every table should have a nat­ural key, neces­sary for accur­ately updat­ing or delet­ing records. This option is used to des­ig­nate the primary key for the table.

Over­write on Change (optional): Use this option for dimen­sion attrib­utes that should be over­writ­ten when their data changes, without adding a new row of information. 

Add Row on Change (man­dat­ory): Con­fig­ure this option for attrib­utes in your dimen­sion where you want to main­tain his­tor­ical inform­a­tion. When the value of these fields changes, a new row will be added to pre­serve his­tor­ical val­ues. What’s more, the Start Date, End Date, and Cur­rent Record Flag fields will be updated in the pre­vi­ous row.

Start­ing Timestamp (man­dat­ory): This option marks the point in time when the attrib­utes for a spe­cific ver­sion of the dimen­sion become effect­ive. Typ­ic­ally, this field is set to the date when the record is loaded, usu­ally using SYSDATE.

End­ing Timestamp (man­dat­ory): This option indic­ates when the cur­rent ver­sion of the dimen­sion becomes invalid and is suc­ceeded by a new ver­sion. It spe­cifies the time just before the dimension’s attrib­utes change, with the next ver­sion begin­ning from this timestamp.

Cur­rent Record Flag (man­dat­ory): This option is set in the Flag field to indic­ate whether the record is act­ive (con­tain­ing the most recent inform­a­tion) or inact­ive (con­tain­ing his­tor­ical information).

Below we can see how we con­figured these attrib­utes in our example SCD table:

Configuration of attributes in example SCD table

5. Cre­ate the Map­ping to load the data flow from the Source table into the SCD dimension

Now let’s cre­ate the Map­ping that will include both the Source table and the Des­tin­a­tion table where the data will be loaded:

Create Mapping from source table to destination table

This would be the logical con­fig­ur­a­tion for each of the tar­get fields:

logical configuration for each of the target fields

As we can see, the data integ­ra­tion pro­cess in the Des­tin­a­tion table is straight­for­ward, as ODI handles the logic needed to pro­cess and incre­ment­ally load the incom­ing data.

Addi­tional  Information

  • START_DATE: This date field indic­ates when a record was inser­ted into the table, mark­ing the start of its valid­ity. As long as this date is less than the END_DATE of the record, the record is con­sidered active.
  • END_DATE: This date field indic­ates when the record’s valid­ity period ends. The status of the record depends on whether this date is in the past or the future. If the END_DATE is in the past, the record is no longer valid, as a more recent record exists, but if the END_DATE is in the future, the record is con­sidered the most current.

By default, the date fields can be left without spe­cific logic, as one of the prop­er­ties of the SCD-type IKM (Integ­ra­tion Know­ledge Mod­ule) is to auto­mat­ic­ally man­age these val­ues, so the manual con­fig­ur­a­tion of these two fields isn’t necessary.

6. Con­fig­ur­ing the Map­ping with the SCD Table 

Logical tar­get Integ­ra­tion type

In the Logical sec­tion of the Map­ping, select the SCD table, then in the Prop­er­ties sec­tion, under Tar­get, set the Integ­ra­tion Type to Slowly Chan­ging Dimen­sion:

Set itnegration type to slowly changing dimension

Phys­ical Tar­get Integ­ra­tion Know­ledge Mod­ule (IKM)

In the phys­ical design of the Map­ping, click on the SCD Table and, in Prop­er­ties, go to the Integ­ra­tion Know­ledge Mod­ule option and select the Slowly Chan­ging Dimen­sion IKM. You can use the default one provided by Oracle, or use a cus­tom­ised IKM spe­cific to your project:

Select Slowly Changing Dimension IKM as Integration Knowledge Module

7. Test­ing and Validation 

The test­ing and val­id­a­tion phase is essen­tial to ensure the cor­rect imple­ment­a­tion of Slowly Chan­ging Dimen­sions Type 2 in ODI. Here’s a step-by-step guide to ensure the integ­rity and accur­acy of your his­tor­ical data:

i. Ini­tial Load Test­ing
Goal: Verify that the ini­tial data load into SCD Type 2 is per­formed correctly.

Steps:Run the ini­tial load of both his­tor­ical and new data.Validate that the his­tor­ical records are prop­erly loaded and that the primary keys are handled properly.

ii. Sim­u­la­tion of Data Changes

Goal: Eval­u­ate how the sys­tem handles changes in dimen­sion attributes.

Steps:Perform sim­u­la­tions of changes to both his­tor­ical and new attrib­utes. Verify that the pre­vi­ous ver­sions of the records remain intact and that a new ver­sion is cre­ated for each change.

iii. Per­form­ance Analysis 

Goal: Eval­u­ate sys­tem per­form­ance when work­ing with SCD Type 2, espe­cially with large data sets.

Steps:Run load tests with large data sets. Mon­itor and ana­lyse exe­cu­tion times and resource use.

iv. Val­id­a­tion of His­tor­ical Queries

Goal: Con­firm that his­tor­ical quer­ies return accur­ate results.

Steps:Create quer­ies that retrieve data from pre­vi­ous ver­sions of the dimen­sions and com­pare the res­ults with known inform­a­tion to verify accuracy.

v. Incre­mental Change Management 

Goal: Ensure that incre­mental changes are handled correctly.

Steps:Apply incre­mental changes to the data and execute the load. Verify that only rel­ev­ant ver­sions are updated and that no unne­ces­sary duplic­ates are created. 

vi. Val­id­a­tion of Reports and Dashboards

Goal: Ensure that reports and dash­boards based on dimen­sional data are accur­ate and consistent.

Steps:Run reports using SCD Type 2 dimen­sional data. Com­pare the res­ults with expect­a­tions and val­id­ate tem­poral consistency. 

vii. Deploy­ment and Maintenance

Deploy the SCD table to the pro­duc­tion envir­on­ment. Set up peri­odic main­ten­ance tasks to purge obsol­ete data and to ensure con­tin­ued efficiency.

With these steps, the Map­ping to load our SCD table is now fully configured.

Bene­fits of SCD Type 2 with ODI

Imple­ment­ing Slowly Chan­ging Dimen­sions with Oracle Data Integ­rator offers many advant­ages for data integ­ra­tion and his­tor­ical data management.

  1. Change His­tory: Main­tains a com­plete his­tory of data changes over time, cru­cial for ana­lys­ing data evolution.
  2. Tem­poral Ana­lysis: Enables the explor­a­tion of data at spe­cific points in time, essen­tial for his­tor­ical report­ing and trend analysis.
  3. Data Con­sist­ency: Ensures con­sist­ent and reli­able data over time, even as dimen­sion val­ues change.
  4. Change Track­ing: Facil­it­ates track­ing of spe­cific attrib­ute changes, help­ing with audits and troubleshooting.
  5. Improved Per­form­ance: Enhances per­form­ance by effi­ciently struc­tur­ing access to his­tor­ical information.
  6. Flex­ib­il­ity in Admin­is­tra­tion: Adapts to vari­ous busi­ness needs, allow­ing the selec­tion of the appro­pri­ate SCD Type for each dimension.
  7. Improved Decision Mak­ing: Provides a com­pre­hens­ive his­tory of changes, sup­port­ing more informed and accur­ate decision-making.

Con­clu­sions

Using ODI for SCD imple­ment­a­tion not only enhances effi­ciency and con­sist­ency in data man­age­ment but also ensures reli­able and error-free data load­ing pro­cesses. By stand­ard­ising devel­op­ment prac­tices and lever­aging ded­ic­ated IKMs, which encap­su­late the know­ledge on how to per­form data integ­ra­tion in an SCD table, ODI improves the reli­ab­il­ity of res­ults and reduces pro­ject com­ple­tion time.

The absence of SCD dimen­sions can lead to lost change his­tory, incon­sist­ent report­ing, and increased long-term main­ten­ance costs. Imple­ment­ing SCD tables in ODI is cru­cial to main­tain his­tor­ical data integ­rity, facil­it­ate detailed his­tor­ical ana­lysis, and sup­port bet­ter decision-making.

Ulti­mately, ODI’s auto­mated change detec­tion and man­age­ment pro­cess optim­ises dimen­sion man­age­ment, con­trib­ut­ing to a more effect­ive and accur­ate data ana­lysis in a data ware­house environment.

If you’re look­ing to imple­ment or optim­ise Slowly Chan­ging Dimen­sions within your data ware­house envir­on­ment, or if you need expert guid­ance on using Oracle Data Integ­rator for effi­cient and reli­able data man­age­ment, we’re here to help. Our team of exper­i­enced pro­fes­sion­als can help you ensure that your data is accur­ate, con­sist­ent, and ready to sup­port your busi­ness decisions. Con­tact us today to learn how we can help you to achieve your data man­age­ment goals!