Dimensions versus Facts in Data Warehousing

I’ve made mention before of a large data warehousing project I’ve been involved with, using the SQL Server 2005 tools. Like a lot of developers on a lot of projects, I was thrown in the deep end and had to learn a lot of new technologies in a short order of time. Fortunately I’m a glutton for punishment.

I found a lot of material on the various tools, and quickly became competent in their use. I also quickly got up to speed on star schema versus snow flake schema, and the divvying up of data into facts and dimensions.

The thing that always puzzled me was, how do you decide what goes into facts versus what goes into your dimensions. Having a short breather I decided to do some reading up to make sure my understanding of theory was up to my understanding of the tools.

The best explanation I’ve found is in “The Microsoft Data Warehouse Toolkit” by Joy Mundy and Warren Thornthwaite. http://shrinkster.com/r4n (Good book, highly recommend it, and standard disclaimer I don’t get any kickbacks so buy it where ever you want.) It covers the Kimball method for warehousing. To quote…

It may help to think of dimensions as things or objects. A thing such as a product can exist without ever being involved in a business event.

Ah, grasshopper, enlightenment begins.

As the book describes, a dimension is your noun. It is something than can exist independent of a business event, such as a sale. Products, employees, equipment, are all things that exist. A dimension either does something, or has something done to it.

Employees sell, customers buy. Employees and customers are examples of dimensions, they do.

Products are sold, they are also dimensions as they have something done to them.

Facts, to carry on another concept from the book are the verb. An entry in a fact table marks a discrete event that happens to something from the dimension table. A product sale would be recorded in a fact table. The event of the sale would be noted by what product was sold, which employee sold it, and which customer bought it. Product, Employee, and Customer are all dimensions that describe the event, the sale.

In addition fact tables also typically have some kind of quantitative data. The quantity sold, the price per item, total price, and so on.

Knowing this makes it much clearer in my mind how to start designing my own warehouses. Or at least ready to take the first step.

About these ads

30 Responses to “Dimensions versus Facts in Data Warehousing”

  1. rasika Says:

    Great. Thanks a lot.

  2. infonitive Says:

    Interesting views on data warehousing. You might find http://www.infonitive.com an interesting forum.

  3. Ruchi Says:

    Thanks, this is very precise info.

  4. Kewal Says:

    good stuff.. easy to understand!

  5. Srini Says:

    Excllent info

  6. dharminder Says:

    is gud for me as a begginer…can u please suggest me some further steps on this way

  7. deep Says:

    Man tat was some gud explaination…Thx bro !!!

  8. gurpreetseehra Says:

    thanks a lot, that hepls..

  9. samuel Says:

    brilant … i mean now i have an idea about this new aproch

  10. Brad Says:

    The “Quantative Data” you refer to that is mentioned in Facts which are not listed as a Dimension are normally called “Measures” (you describe them above as quantity sold, the price per item, total price, etc).

  11. Kanika Gupta Says:

    Very good explanation for Non techie ppl .

  12. Mona Has Says:

    It is perfect.Thank you so much.
    Cannot be better than this.

  13. Badminton Training Says:

    Nice explanation. I like your examples, helps me understand quickly what’s a dimension and what’s a fact.

  14. salone boy Says:

    thanks. very enlightening. and thank u to Brad too!

  15. Stuart Cowen Says:

    Awesome springboard of an explanation!

  16. snehal Says:

    hello thanks its really nice but can you explain more as giving different examples so that i can get clear idea about it.

  17. sanoka Says:

    Sanoka
    Thanks

  18. Amit Says:

    I must say that tabular comparison could have helped me alot

  19. Pipit Says:

    Great explanation, thank you

  20. Lisa Says:

    Can dimension rows (nouns) exist without a FK reference in the fact table? As an example a customer may be a dimension but may not have purchased yet. Would/should the customer be in the dimension for customer only and not on the fact table?

  21. arcanecode Says:

    Lisa, yes it’s entirely possible to have a row in the Dimension table that is not used in the Fact table. If you were to use the data to build a cube, the orphaned dimension wouldn’t appear as the core of a cube is the fact table. However you could still reference it from more traditional reporting that connects directly to the database as opposed to a cube.

  22. simon Says:

    Thanks for the information and also data warehousing site, http://dwhlaureate.blogspot.in/ , really helpful

  23. Rohit Says:

    The explanation is good!!
    I have a doubt
    We have a staging layer which extracts tables from different source.It is just a copy of other sources.
    In the workflows , facts are being run before dimensions.
    This is against the normal view that dimensions should run before facts.
    Please suggest what could be the reason?

    Thanks in advance

  24. sadashiva swamy H B Says:

    gr8 help….thanks a lot

  25. karthik Says:

    thanks. short and sweet.

  26. sekhar Says:

    Simple to understand


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 106 other followers

%d bloggers like this: