Slowly Changing Dimensions

Since the facts in your fact tables mark discrete events in time, you would expect them to change pretty often. Your dimensions, on the other hand, tend to be pretty stable. An Employee dimension, for example, wouldn’t change very often. But, it will change. Employees move, get married, croak, or quit.

To handle these changes, data warehouses have adopted the concept of slowly changing dimensions. They are categorized into three types.

Type 1 is the basic type, with it previous versions of the dimension are overwritten with the latest data. Changes are not tracked. This can be OK, based on what the business will be doing with the data.

Take the example of an Employee dimension. Our star employee, Hortence McGillicutty moves, and updates her address. With the warehouse, the only business task done with the address is mailings. In this case, we don’t care what her address was two years ago, we only need her current address for mailing her paycheck and other related data. In this case, loosing the history of her previous address is fine, so a Type 1 dimension is a good choice.

Let’s say, however, that the business uses the Employee dimension much more extensively. Perhaps we have an application that on the left side of the screen displays information pertaining to the most recent financial statements. On the right is a scanned in image where our star Hortence McGillicutty has signed off on these statements.

A few months go by, and our heroine falls in love and gets married. If we were still using a Type 1, the scanned in image still has her maiden name, but the display to the left now shows her new married name, Hortence Hollywogger. Not a big deal you say? How about the first time a Sarbanes-Oxley auditor comes in, finds the discrepancy, and threatens big fines.

To solve this, data warehousing has Type 2 slowly changing dimensions. With Type 2, from date and to date fields are added to the dimensional table. These dates are the valid dates for that record. In our example, once Hortence got married we’d have two rows in the employee dimension:

Key EmpId Name From To
123 555 Hortence McGillicutty 07/17/2001 06/15/2007
123 555 Hortence Hollywogger 06/16/2007 12/31/9999

The warehouse could then look up the correct name based on the date. In the case of our above example, the system would look up based on the date the report was signed, discover it was during the 07/17/2001-06/15/2007 time frame, and display her name correctly, leaving our auditor looking elsewhere for imperfections.

I mentioned there was a third type, which predicibly is named Type 3. With type 3, you don’t add new rows, but new columns to the table to handle changes. As you might guess, this can become quite a maintenance nightmare, and is rarely if ever used. Like Microsoft Bob ( http://en.wikipedia.org/wiki/Microsoft_bob ), Type 3 is probably best forgotten about.

To summarize, use Type 1 when tracking changes to the data is not necessary. Use Type 2 when you need to know when those changes were made. How do you decide?

That’s where your customers become involved. It will be necessary to know what they plan to do with that information. Yes, I know, talking to customers can be a scary experience, but hey just envision them using Microsoft Bob and you’ll be OK.

Advertisement

One thought on “Slowly Changing Dimensions

  1. As I come back in time and read this, I have one tip to pass along. In the example I used 12/31/9999 for the date. What’s actually more common is to leave the to (or end) date for the current record null.

    Then when doing most queries, it’s simple to get the most current row by adding “and to_date is null” to the where clause.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s