I’ve made mention before of a large data warehousing project I’ve been involved with, using the SQL Server 2005 tools. Like a lot of developers on a lot of projects, I was thrown in the deep end and had to learn a lot of new technologies in a short order of time. Fortunately I’m a glutton for punishment.
I found a lot of material on the various tools, and quickly became competent in their use. I also quickly got up to speed on star schema versus snow flake schema, and the divvying up of data into facts and dimensions.
The thing that always puzzled me was, how do you decide what goes into facts versus what goes into your dimensions. Having a short breather I decided to do some reading up to make sure my understanding of theory was up to my understanding of the tools.
The best explanation I’ve found is in “The Microsoft Data Warehouse Toolkit” by Joy Mundy and Warren Thornthwaite. http://shrinkster.com/r4n (Good book, highly recommend it, and standard disclaimer I don’t get any kickbacks so buy it where ever you want.) It covers the Kimball method for warehousing. To quote…
“It may help to think of dimensions as things or objects. A thing such as a product can exist without ever being involved in a business event.”
Ah, grasshopper, enlightenment begins.
As the book describes, a dimension is your noun. It is something than can exist independent of a business event, such as a sale. Products, employees, equipment, are all things that exist. A dimension either does something, or has something done to it.
Employees sell, customers buy. Employees and customers are examples of dimensions, they do.
Products are sold, they are also dimensions as they have something done to them.
Facts, to carry on another concept from the book are the verb. An entry in a fact table marks a discrete event that happens to something from the dimension table. A product sale would be recorded in a fact table. The event of the sale would be noted by what product was sold, which employee sold it, and which customer bought it. Product, Employee, and Customer are all dimensions that describe the event, the sale.
In addition fact tables also typically have some kind of quantitative data. The quantity sold, the price per item, total price, and so on.
Knowing this makes it much clearer in my mind how to start designing my own warehouses. Or at least ready to take the first step.
31 thoughts on “Dimensions versus Facts in Data Warehousing”
Great. Thanks a lot.
Interesting views on data warehousing. You might find http://www.infonitive.com an interesting forum.
Thanks, this is very precise info.
good stuff.. easy to understand!
is gud for me as a begginer…can u please suggest me some further steps on this way
Man tat was some gud explaination…Thx bro !!!
thanks a lot, that hepls..
brilant … i mean now i have an idea about this new aproch
Thank you for more explanations.
The “Quantative Data” you refer to that is mentioned in Facts which are not listed as a Dimension are normally called “Measures” (you describe them above as quantity sold, the price per item, total price, etc).
Very good explanation for Non techie ppl .
It is perfect.Thank you so much.
Cannot be better than this.
Nice explanation. I like your examples, helps me understand quickly what’s a dimension and what’s a fact.
thanks. very enlightening. and thank u to Brad too!
Awesome springboard of an explanation!
hello thanks its really nice but can you explain more as giving different examples so that i can get clear idea about it.
I must say that tabular comparison could have helped me alot
Great explanation, thank you
Can dimension rows (nouns) exist without a FK reference in the fact table? As an example a customer may be a dimension but may not have purchased yet. Would/should the customer be in the dimension for customer only and not on the fact table?
Lisa, yes it’s entirely possible to have a row in the Dimension table that is not used in the Fact table. If you were to use the data to build a cube, the orphaned dimension wouldn’t appear as the core of a cube is the fact table. However you could still reference it from more traditional reporting that connects directly to the database as opposed to a cube.
Thanks for the information and also data warehousing site, http://dwhlaureate.blogspot.in/ , really helpful
The explanation is good!!
I have a doubt
We have a staging layer which extracts tables from different source.It is just a copy of other sources.
In the workflows , facts are being run before dimensions.
This is against the normal view that dimensions should run before facts.
Please suggest what could be the reason?
Thanks in advance
gr8 help….thanks a lot
thanks. short and sweet.
Simple to understand
Thanks for the simplified explanation!