Dimensions versus Facts in Data Warehousing

I’ve made mention before of a large data warehousing project I’ve been involved with, using the SQL Server 2005 tools. Like a lot of developers on a lot of projects, I was thrown in the deep end and had to learn a lot of new technologies in a short order of time. Fortunately I’m a glutton for punishment.

I found a lot of material on the various tools, and quickly became competent in their use. I also quickly got up to speed on star schema versus snow flake schema, and the divvying up of data into facts and dimensions.

The thing that always puzzled me was, how do you decide what goes into facts versus what goes into your dimensions. Having a short breather I decided to do some reading up to make sure my understanding of theory was up to my understanding of the tools.

The best explanation I’ve found is in “The Microsoft Data Warehouse Toolkit” by Joy Mundy and Warren Thornthwaite. http://shrinkster.com/r4n (Good book, highly recommend it, and standard disclaimer I don’t get any kickbacks so buy it where ever you want.) It covers the Kimball method for warehousing. To quote…

It may help to think of dimensions as things or objects. A thing such as a product can exist without ever being involved in a business event.

Ah, grasshopper, enlightenment begins.

As the book describes, a dimension is your noun. It is something than can exist independent of a business event, such as a sale. Products, employees, equipment, are all things that exist. A dimension either does something, or has something done to it.

Employees sell, customers buy. Employees and customers are examples of dimensions, they do.

Products are sold, they are also dimensions as they have something done to them.

Facts, to carry on another concept from the book are the verb. An entry in a fact table marks a discrete event that happens to something from the dimension table. A product sale would be recorded in a fact table. The event of the sale would be noted by what product was sold, which employee sold it, and which customer bought it. Product, Employee, and Customer are all dimensions that describe the event, the sale.

In addition fact tables also typically have some kind of quantitative data. The quantity sold, the price per item, total price, and so on.

Knowing this makes it much clearer in my mind how to start designing my own warehouses. Or at least ready to take the first step.


31 thoughts on “Dimensions versus Facts in Data Warehousing

  1. The “Quantative Data” you refer to that is mentioned in Facts which are not listed as a Dimension are normally called “Measures” (you describe them above as quantity sold, the price per item, total price, etc).

  2. Can dimension rows (nouns) exist without a FK reference in the fact table? As an example a customer may be a dimension but may not have purchased yet. Would/should the customer be in the dimension for customer only and not on the fact table?

  3. Lisa, yes it’s entirely possible to have a row in the Dimension table that is not used in the Fact table. If you were to use the data to build a cube, the orphaned dimension wouldn’t appear as the core of a cube is the fact table. However you could still reference it from more traditional reporting that connects directly to the database as opposed to a cube.

  4. The explanation is good!!
    I have a doubt
    We have a staging layer which extracts tables from different source.It is just a copy of other sources.
    In the workflows , facts are being run before dimensions.
    This is against the normal view that dimensions should run before facts.
    Please suggest what could be the reason?

    Thanks in advance

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s