I’ve made mention before of a large data warehousing project I’ve been involved with, using the SQL Server 2005 tools. Like a lot of developers on a lot of projects, I was thrown in the deep end and had to learn a lot of new technologies in a short order of time. Fortunately I’m a glutton for punishment.
I found a lot of material on the various tools, and quickly became competent in their use. I also quickly got up to speed on star schema versus snow flake schema, and the divvying up of data into facts and dimensions.
The thing that always puzzled me was, how do you decide what goes into facts versus what goes into your dimensions. Having a short breather I decided to do some reading up to make sure my understanding of theory was up to my understanding of the tools.
The best explanation I’ve found is in “The Microsoft Data Warehouse Toolkit” by Joy Mundy and Warren Thornthwaite. http://shrinkster.com/r4n (Good book, highly recommend it, and standard disclaimer I don’t get any kickbacks so buy it where ever you want.) It covers the Kimball method for warehousing. To quote…
“It may help to think of dimensions as things or objects. A thing such as a product can exist without ever being involved in a business event.”
Ah, grasshopper, enlightenment begins.
As the book describes, a dimension is your noun. It is something than can exist independent of a business event, such as a sale. Products, employees, equipment, are all things that exist. A dimension either does something, or has something done to it.
Employees sell, customers buy. Employees and customers are examples of dimensions, they do.
Products are sold, they are also dimensions as they have something done to them.
Facts, to carry on another concept from the book are the verb. An entry in a fact table marks a discrete event that happens to something from the dimension table. A product sale would be recorded in a fact table. The event of the sale would be noted by what product was sold, which employee sold it, and which customer bought it. Product, Employee, and Customer are all dimensions that describe the event, the sale.
In addition fact tables also typically have some kind of quantitative data. The quantity sold, the price per item, total price, and so on.
Knowing this makes it much clearer in my mind how to start designing my own warehouses. Or at least ready to take the first step.