Category Archives: Business Intelligence

SQL Server 2008 February CTP 6 Released

By now you’ve probably heard the news that the latest SQL Server 2008 CTP is out. This release has a lot of new BI functionality included, such as:

  • Full Text Search
  • Support for Word export in Reporting Services
  • Data Visualization Enhancements
  • Report Design Enhancements in BIDS

I intend to install the latest CTP in a virtual environment and start testing out some of the cool new functionality. If you want to follow along, head over to http://connect.microsoft.com and sign up to play with the SQL Server 2008 CTP! Now if you’ll excuse me I need to get started installing!

SQL Server 2005 Full Text Searching at the Huntsville Alabama Code Camp

My third and final presentation for the Alabama Code Camp 6 is “Introduction to SQL Server Full Text Searching”. Here are the materials I’ll be using during the demo.

First, here is a PDF of the PowerPoint slides:

Full Text Search Power Points

Next, most of the demos used SQL statements. This PDF file has all of the SQL plus some associated notes.

Full Text Search Demo Scripts

Finally, I did a WPF project that demonstrated how to call a full text search query from a WPF Windows application. Annoyingly enough WordPress (who hosts my blog) won’t let me upload ZIP files, so I renamed the extension to pdf. After you download the file to your drive, remove the .pdf and put the zip extension back on, then it should expand all the source for you correctly. (Yes, I know, I really need to get a host server for binaries, one of these days I’ll get around to it, but for today…)

Source for WPF Demo

Introduction to SQL Server Integration Services – Huntsville Alabama Code Camp

My second post of the day at Alabama Code Camp 6 in Huntsville is “Introduction to SQL Server Integration Services”.

The slide deck is here: Intro to SSIS Slide Deck

The “cheat sheet” or script I used to do the demo is here: Script for doing the SSIS Demo You can step through it to recreate all of the things I did in the demo today.

Finally here is the finished project. It’s actually zipped, but my current host doesn’t like zip extensions so when you download it change the extension from txt back to ZIP. Finished SSIS Project

Steel City SQL Server Users Group – SQL Server 2005 Full Text Searching

Tonight I’ll be presenting at the Steel City SQL Users Group “Introduction to SQL Server Full Text Searching”. Here are the materials I’ll be using during the demo.

First, here is a PDF of the PowerPoint slides:

Full Text Search Power Points

Next, most of the demos used SQL statements. This PDF file has all of the SQL plus some associated notes.

Full Text Search Demo Scripts

Finally, I did a WPF project that demonstrated how to call a full text search query from a WPF Windows application. Annoyingly enough WordPress (who hosts my blog) won’t let me upload ZIP files, so I renamed the extension to pdf. After you download the file to your drive, remove the .pdf and put the zip extension back on, then it should expand all the source for you correctly. (Yes, I know, I really need to get a host server for binaries, one of these days I’ll get around to it, but for today…)

Source for WPF Demo

Look forward to seeing you at the New Horizons Training center tonight, 6:00 PM!

Don’t Uninstall Visual Studio 2005 Yet!

One of the great benefits of Visual Studio 2008 is the ability for it to target multiple .Net Frameworks. This means, in theory you could go ahead and begin using Visual Studio 2008 even though you still need to write apps that are 2005 / .Net 2.0 compliant. You might be tempted to go ahead and uninstall 2005. And that would be fine if you are only doing .Net development. But wait…

If you are still doing SQL Server BIDS (Business Intelligence Developer Studio) then don’t uninstall Visual Studio 2005! Currently there’s no support in VS2008 for doing SQL Server 2005 BIDS Development. If you uninstall VS2005 you won’t be able to do any more BIDS work. Trust me, I found out the hard way.

After uninstalling VS2005, I went to do a BIDS project and that’s when I got hit with the nasty surprise. The uninstall had also removed the Dev Environment that was shared with BIDS. I tried to rerun the install of my SQL Server Developer Edition, but for some reason it thought I wanted to upgrade. It kept giving me the message “You cannot upgrade a version of SQL Server from the GUI, you must use the command line.”

I finally had to reinstall VS2005, along with all it’s service packs. After that I was able to work on my BIDS projects again. So take it from me, if you are still doing SQL Server 2005 Business Intelligence projects, Visual Studio 2005 still has some life in it yet.

Does MacGyver Dream of Mark Miller?

For Christmas this year my family gave me a copy of MacGyver, Season 1 and 2 on DVD. My wife’s side of the family gave me a gift card which I used to get seasons 3 & 4. I’m a long time MacGyver fan, but my wife had only seen one or two episodes and my kids had never seen it at all, so we’ve been having a lot of fun watching. My favorite part of the series was the voice-overs, where you’d hear MacGyver’s voice as he explained what he was doing. It always started with some odd thought or story that led you through the thought process of how he came to the conclusion to build whatever wacky life saving device he was constructing.

I’ve come to realize in some ways these blog entries are sort of like the MacGyver voice-overs, my inner thoughts being created for you on the web. So I hope you’ll bear with me a few minutes while I relate a rather bizarre dream I had last night.

In my dream I’m standing on stage, in front of a fully loaded computer. It has all the bells and whistles, VS2008, SQL Server, and so on. On the other side of the stage, Mark Miller is there, in front of a similar computer. For those unfamiliar with Miller, he’s the genius behind CodeRush and RefactorPro, tools to help you write code faster. Some time back, when the product was first released Miller used to challenge the audience to beat him in a code writing contest. His machine had CodeRush, and he would use chopsticks to write code, his competitor could use their fingers but did not have CodeRush on their machine. Of course Mark always won.

So sure enough, in my dream there’s Miller, chopsticks in hand ready to go, and I’m the guy going up against him. Our task is to take data from table A1, create a mirror table and name it table A2, and then move all million rows from A1 into A2. As you might guess, in my dream, I win. How?

Well I didn’t write a program. Instead I first jumped into SQL Server Management Studio (SSMS) and used its script generating capability to produce a create table script. Make a quick search and replace and boom I’ve got table A2 created. I then jump over to the Business Intelligence Developer Studio (BIDS) to create a SQL Server Integration Services (SSIS) package to do the data move. (Yes, I probably could have used the script generation of SSMS again to generate an Insert script, but I was showing off.) In about three to four minutes I had accomplished the task and moved all the data while Miller was pecking away at computer with his chopsticks.

I didn’t win because I’m a hot shot coder who is smarter than my competitor. Miller is a (some say mad) genius who can run circles around me in the coding world. As I told the folks in my dream, and I’m telling you now sometimes the best solution to a programming challenge isn’t to program at all! If you read yesterday’s post, Straining at Gnats, you may recall I said “…take some time. Push back from your computer and think for a moment. Think what the true outcome of your application is supposed to be. Not “what will the program do” but “what will the program do for the user???” Think about how best to achieve the user’s goals.

When you are thinking about solutions, take a minute to look outside of your favorite programming language. Is it possible to achieve the goal without writing any code at all? What tool or tools do you have in your tool box that you can combine to get the job done? Here’s a great example that happened to me just before I took off on my holiday vacation.

As I’ve mentioned before at work we have a Business Intelligence (BI) app I work as the lead on, it imports data to a SQL Server 2005 warehouse via SSIS then uses SQL Server Reporting Services (SSRS) to generate reports. The data is imported from a work order management system we bought many years ago. We also have some engineers who have a tiny little Microsoft Access database. This database has a primary key column; we’ll call it a part number for purposes of this example. There are three more columns, some data they need to know for each part but are not found in our big system. They’d like to add this data to the reports our BI app generates. Two last pieces of information, they only update this data once per quarter. Maybe. The last few years they have only done 3 updates a year. Second, the big system I mentioned is due to be replaced sometime in the next two years with a new system that will have their three fields.

A lot of solutions presented themselves to me. Write an ASP.Net app, with a SQL Server back end then use SSIS to move the data. Elegant, but a lot of work, very time consuming for a developer, especially for something that can go away in the near future. Write an SSIS package to pull data from Access? Risky, since we had no control over the Access database. A user could rename columns or move the database all together, in either case trashing the SSIS. Several other automation solutions were considered and rejected, before the final solution presented itself: not to automate at all.

Once per quarter I’ll simply have the engineers send me their Access database. Microsoft Access has a nice upsizing wizard that will move the table to SQL Server, I’ll use that to push the data onto the SQL Server Express that runs on my workstation. I’ll then use the script generating capability of SSMS to make an Insert script for the data. Add a truncate statement to the top to remove the old data and send it to the DBA to run. When I ran through it the first time my total time invested was less than ten minutes. In a worst case scenario I spend 40 minutes a year updating the data so it’s available for reporting. That’s far, far less time that I would have spent on any other solution.

The next time you have a coding challenge, take a moment to “think like MacGyver”. Look at all the tools you have lying around your PC and see what sort of solutions you can come up with. Once you are willing to step outside the comfort zone of your favorite coding language, you may be able to come up with some creative, MacGyver like solutions to your user’s problems.

 

PS – If you missed the announcement while on vacation, DevExpress just released CodeRush / RefactorPro 3.0. More than 150 refactorings and lots of new CodeRush features! Update yours today.

SQL Server Staging Tables – Truncate versus Delete

I’ve been reading a lot of books on SSIS (SQL Server Integration Services) and BI (Business Intelligence) over the course of the year. I want to pass along a little tidbit I haven’t seen in any of them. I’ll preface this by stating our staging tables and data warehouse are all in SQL Server 2005.

Our process is probably similar to others, we pull the data in, and if the warehouse needs to be updated we place the data into a staging table. At the end of the process we do a mass update (via a SQL statement) from the staging table to the main data warehouse tables we use for reporting. Then we delete the records in the staging table. Which seemed like a reasonable thing to do, but wound up getting us in a lot of trouble. Over the course of the last few months our run times for the SSIS job have gotten slower and slower and sloooooooooooooower. Our job was taking as long as 50 minutes to complete sometimes. One of our developers noticed the database seemed to be taking up a lot of space. He found a simple select count(*) was taking eight minutes on what was supposed to be an empty staging table.

Some research on the web explained what we were doing wrong. In one of my favorite SQL Server blogs, I want some Moore, blogger Mladen Prajdic has a great article on the differences between delete and truncate.

http://weblogs.sqlteam.com/mladenp/archive/2007/10/03/SQL-Server-Why-is-TRUNCATE-TABLE-a-DDL-and-not.aspx

The solution then was to not perform a delete, but a truncate on our staging tables. We went ahead and manually issued a truncate on our staging tables, and saw an immediate beneift. Our average run time went from 50 minutes to 8 minutes!

I’m not sure why I haven’t seen this mentioned before, perhaps I just haven’t read the right blog or book yet. But I wanted to pass this along so you could be spared some of the headaches we went through. If your SSIS uses SQL Server 2005 tables, use a truncate and not a delete to avoid speed issues. Alternatively, at least make sure to run truncates on a regular basis to keep those staging areas cleaned out!

SQL Server 2005 Reporting Services ReportViewer Control and IE7

I’ve spent the last few days pulling my hair out over an issue with my reports rendering correctly inside and ASP.Net page and IE7. Like many companies we are preparing to roll out Internet Explorer 7 to all of the desktops. As part of that we’re doing regression testing on our applications, and in doing my tests uncovered an issue with IE7 and our BI (Business Intelligence) solution. It is an ASP.Net 2.0 page that uses the Report Viewer control to display SQL Server 2005 Reporting Services reports.

In IE6, everything has been running fine, reports rendering correctly and users happy. In testing for IE7, we found the reports only drew enough to fill up the available space inside the browser, and then stopped. No scroll bars, and if you resized the browser it did not paint the interior correctly.

The solution, as it turns out was pretty obscure. A co-worker found a thread at http://www.eggheadcafe.com/software/aspnet/27965101/re-problems-setting-webf.aspx that put me on the correct track. The solution needed three minor tweaks to our code.

First, you need to remove the DOCTYPE line from your aspx page. Yes, that’s correct, the line that is automatically inserted when you create a new page. It should look like:

<!DOCTYPE html PUBLIC “~//W3C//DTD XHTML 1.0 Transitional//EN” "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Find it and delete it.

Next, on the ReportViewer control, make sure the AsyncRendering property is set to False. When you set it to true (the default) the report did render correctly, but instead of using the entire webpage to scroll it put another scroll bar onto the report viewer itself, and you had to use it to move through your report. Our testing found having two scroll bars (one for the page and one for the report) to be a bit confusing to the users. Try it both ways though and see what works best for you, as your mileage may vary. For us, we went with False as the setting.

In examining the AsyncRendering property, I found this article on MSDN which further confirmed the need to remove the DocType: http://msdn2.microsoft.com/en-us/library/ms252090(VS.80).aspx

In the final tweak, I made sure to set the width of the ReportViewer control to 100% and removed any setting of height. This allowed the page to scale automatically to the size it needed to be.

Since this was not an intuitive fix, I’m hoping getting the word out will help others and save them the two weeks of frustration I went through.

A Developer’s Guide to Installing SQL Server 2005 – Part 2 – Installing SQL Server 2005 Express Edition with Advanced Services SP2

OK, so you read my instructions on my previous post, and downloaded the appropriate bits from http://msdn2.microsoft.com/en-us/express/bb410792.aspx If you are planning to, are have already installed the Developer Edition, you only need to download the “Microsoft SQL Server 2005 Express Edition with Advanced Services” link. If you are only planning on doing the Express Edition, you’ll also want to download the link under it, “Microsoft SQL Server 2005 Express Edition Toolkit”. My instructions below only cover the database, not the toolkit.

OK, you’ve just downloaded the bits and it’s already time for the first “gotcha”. Before installing, you’ll want to make sure you have the IIS components installed on your machine. Odds are if you are an ASP.NET developer you probably already have them, but if not it’s a good idea to check.

In Vista, go to Control Panel, pick Programs and Features, then “Turn Windows Features on or off”. In the dialog that appears, scroll to Internet Information Services and expand the tree. Make sure all the items under “Web Management Tools” are checked. Now go down to “World Wide Web Services”. Under “Application Development Features” make sure .Net Extensibility, ASP.Net, and the ISAPI items are on. In “Common Http Features” check on everything but redirection. Everything under Health and Diagnostics should be on so you can log errors. Finally, under Security all you need on is Request Filtering.

XP is a little simplier. Under the control panel, Go to Add or Remove Programs, Add or Remove Windows Components. In the dialog that pops up, scroll down to Internet Information Services and check it. Click on details, and add FrontPage 2000 Server Extensions to what’s already checked. Then click OK a few times to get the updates installed.

In either situation, you may need your Windows disk handy to install the additional components.

OK, now you’re finally ready to run the setup. Double click on your sqlexpr_adv.exe to see this screen. All you have to do is check on the “I accept..” to accept the license agreement, then click Next.

clip_image001

Now SQL Server will check to see if it has everything it needs. This is the screen where it would yell at you if you didn’t have IIS installed. Since you do, just click Next.

clip_image002

Now we get a nice welcome screen. I’m feeling all warm and fuzzy now! Click next to continue.

clip_image003

Here SQL Server is going to make sure you have everything you need installed. If there is anything missing you should fix it before you proceed. If you get all greens, you’re good to go.

clip_image004

OK, here we need to deviate from the standard “next next next” default model a lot of us are used to. Make absolutely sure you UNCHECK the box you see below, “Hide advanced configuration options” We definitely want to see those advanced options.

clip_image005

Do you have it unchecked yet? If you don’t, you won’t get Reporting Services or Full Text Searching or the other cool features installed. Go ahead and fill out your name and company. Oh, and did I mention the box should be unchecked? If it is, like the screen below, you can click next.

clip_image006

OK, here we are on that advanced options screen. As you can see, all the cool toys are set to not install. Well that’s no fun. Go through each one and set it to install.

clip_image007

You can see below I have all the items checked on. I want to mention one thing. On the box I’m installing this on, I’ve already got the developer edition installed. So I don’t really need the Client Components. I’ve checked them on so I can show you something later. For now, click Next.

clip_image008

OK, here is your chace to give your server a name. Since everyone in the world uses SQLExpress, you should definitely not use it. Pick a name that’s logical, perhaps your computer name_express or computer_sql. Whatever you pick, change it now click Next.

clip_image009

Remember a few screens back when I said I was installing SQL Express on a machine that already had the Developer version installed? This is where it lets me know. The nice thing is the installer is smart enough not to choke. It simply says “Hey, I’ve already got this piece installed so I’ll skip it.” This is handy so you don’t have to think too much about the pieces you’re installing, you can pick everything and if it’s already there the installer will let you know then go on it’s merry way. Speaking of which, it’s time to get on our merry way by clicking Next.

clip_image010

OK, now it wants to know a few things about how to run. First it wants to know what user id to run the service as. The default you see below will work just fine. Next it wants to what pieces it should run at startup. If you are going to be running this once in a blue moon, I would uncheck the SQL Server and Reporting Services options, then run them manually when you need them. On the other hand, if you are going to be in SQL Server nearly every day, like I am, it makes sense to leave these checked so they are started and ready for you. You’ll also want to leave these on if you are going to let other people connect to your box for testing reasons. Once you make your choices, pick Next.

clip_image011

OK, here SQL Server wants to know how you are logging into the server. DO NOT click next. Instead look at the next screen shot.

clip_image012

I don’t like using strictly windows authentication. Instead I much prefer using Mixed Mode. Pick it, then enter a password you can use for the sa account. I prefer this method as it gives me a little flexibility on how to use the database. Make sure you pick a password you’ll remember! Enter it and yes, you guessed it, click next.

clip_image013

Next it’s asking how you want SQL Server to sort text. I always take the default here, and click next. If you have special sorting needs you can change this, but odds are you can click next too.

clip_image014

This next screen is another one of those “Gotchas”.

clip_image015

This is really important to check on, so you can easily administer your own database. Check it on so it looks like the screen below, and hit next.

clip_image016

OK, time for another important question. You can install Reporting Services using the default config, or you can install but not configure now. The idea behind this was to allow you to install but not configure, then copy the configuration from another existing server. This is ideal for the enterprise where we are setting up one server after another and want them all to be identical. However, for our workstation we can simply install using the default configuration, and hit next.

clip_image017

Here Microsoft would like your help. Any data Microsoft can gather about SQL Server will help make it a better product. In some small way you’ll be contributing to the next version. However, I do recognize there are some people or shops that are wary of anonymous statistics gathering. I turned it on for my install, but if you have any doubts leave it off like I show below. Once your decision is made, click Next.

clip_image018

It’s finally time to begin the install. SQL Server shows you your choices one last time, and gives a big “are you sure”. You are, so just hit next.

clip_image019

OK, go get us a couple of cups of coffee, we’ll sit here and wait a bit. This screen will keep us updated on the install process.

clip_image020

As things are completed, they turn green, as you see here. When everything is installed, the Next button will become enabled. Click it to proceed.

clip_image021

This is the last screen of the install, but don’t blow it off casually. It has some good info. Scroll the message window down to the bottom, as you see here, and you’ll find some good links. First, as you see on the screen, is a link to the Books on Line. You should download these and install them, they will become an often referenced companion and you learn SQL Server. Below that (just off the screen) are links to sample databases. If you are installing on a developer workstation and have the space, you should seriously consider installing these. Many, many authors (including myself, so you’ve been warned) will use these as material for their articles and samples. If nothing else it is very informative to see how Microsoft created a database.

clip_image022

That completes our installation of SQL Server Express with Advances Services. But the fun is just beginning…

Just The Facts

I’ve spent the last few days covering dimension tables, but I should briefly mention that there are three types of fact tables as well. The first is the kind most will be familiar with, a transactional fact.

These are straightforward, every time an event occurs in your transaction system, a copy of that event is made in the fact table. Someone sells a product, or updates an inventory record, or some similar event occurs it’s reflected in your transaction fact table. Predominately your warehouse will be made up of these types of tables.

Often times your users will want to be quickly able to reference data that’s aggregated over time; sales to date, sales per week, and so forth. To make getting to these totals fast, the second and third types of fact tables are known as snapshot tables. In addition to speeding the return of data to the user, it also reduces the load on the server as totals are aggregated at one time instead of each time the user runs a report.

The first of the snapshots (which is also the second type of fact table) is called a periodic snapshot fact table. At some defined point in time, a job will run that will aggregate the transactional fact data and place it into the periodic snapshot fact table. Perhaps every Saturday night at midnight a job runs which tallies up the previous weeks sales into a sales for the week fact table. Another classic example is the month ending totals after accounting closes out the business month.

The other type of snapshot, and the third type of fact table, is an accumulating snapshot. An accumulating snapshot is constantly being updated, instead of waiting for some point in time. Perhaps you have a trigger on the transaction sales table. Whenever a sale is made, that trigger updates the sales to date table.

Accumulating snapshots have the advantage of always being up to date and accurate. If you have a lot of them, or the transaction tables that they are based on have frequent updates, they can become a major drain on your system resources. For that reason it’s best to limit, to use them with transaction tables that are seldom updated. Let’s take two examples to help illustrate.

In an average car dealership, I’d guess that they might sell four cars a day. Since the transaction volume is so low, four a day, an accumulating snapshot would be entirely reasonable to track sales to date.

On the other hand, let’s look at a large site like woot.com, that deals with hundreds if not thousands of sales in a very short time span. In their case, an accumulating snapshot would be a major drain on system resources. Instead they would be much better served by a periodic snapshot.

A periodic snapshot, by the way, isn’t limited to a once a week or once a month situation, even though that’s how they are primarily used. You could update your periodic snapshots once a day, once an hour, or even once every 10 minutes if your server could support it. The “periodic” simply means that it runs on a regular schedule rather than constantly.

In addition to your transactional fact data, you should review your reporting needs to see if either of the two types of snapshots, periodic or accumulating, could be of use in your warehouse to speed data reporting and reduce the load on your servers.

Slowly Changing Dimensions

Since the facts in your fact tables mark discrete events in time, you would expect them to change pretty often. Your dimensions, on the other hand, tend to be pretty stable. An Employee dimension, for example, wouldn’t change very often. But, it will change. Employees move, get married, croak, or quit.

To handle these changes, data warehouses have adopted the concept of slowly changing dimensions. They are categorized into three types.

Type 1 is the basic type, with it previous versions of the dimension are overwritten with the latest data. Changes are not tracked. This can be OK, based on what the business will be doing with the data.

Take the example of an Employee dimension. Our star employee, Hortence McGillicutty moves, and updates her address. With the warehouse, the only business task done with the address is mailings. In this case, we don’t care what her address was two years ago, we only need her current address for mailing her paycheck and other related data. In this case, loosing the history of her previous address is fine, so a Type 1 dimension is a good choice.

Let’s say, however, that the business uses the Employee dimension much more extensively. Perhaps we have an application that on the left side of the screen displays information pertaining to the most recent financial statements. On the right is a scanned in image where our star Hortence McGillicutty has signed off on these statements.

A few months go by, and our heroine falls in love and gets married. If we were still using a Type 1, the scanned in image still has her maiden name, but the display to the left now shows her new married name, Hortence Hollywogger. Not a big deal you say? How about the first time a Sarbanes-Oxley auditor comes in, finds the discrepancy, and threatens big fines.

To solve this, data warehousing has Type 2 slowly changing dimensions. With Type 2, from date and to date fields are added to the dimensional table. These dates are the valid dates for that record. In our example, once Hortence got married we’d have two rows in the employee dimension:

Key EmpId Name From To
123 555 Hortence McGillicutty 07/17/2001 06/15/2007
123 555 Hortence Hollywogger 06/16/2007 12/31/9999

The warehouse could then look up the correct name based on the date. In the case of our above example, the system would look up based on the date the report was signed, discover it was during the 07/17/2001-06/15/2007 time frame, and display her name correctly, leaving our auditor looking elsewhere for imperfections.

I mentioned there was a third type, which predicibly is named Type 3. With type 3, you don’t add new rows, but new columns to the table to handle changes. As you might guess, this can become quite a maintenance nightmare, and is rarely if ever used. Like Microsoft Bob ( http://en.wikipedia.org/wiki/Microsoft_bob ), Type 3 is probably best forgotten about.

To summarize, use Type 1 when tracking changes to the data is not necessary. Use Type 2 when you need to know when those changes were made. How do you decide?

That’s where your customers become involved. It will be necessary to know what they plan to do with that information. Yes, I know, talking to customers can be a scary experience, but hey just envision them using Microsoft Bob and you’ll be OK.

Conformed Dimensions

Yesterday I talked about the differences in dimensions versus facts. Today I’d like to extend that discussion with the importance of Conformed Dimensions.

One of the major advantages of a data warehouse is the ability to combine data from various, and sometimes vastly different, systems. Let’s take a common problem: your company has three different systems, sales, production, and purchasing. You’ve bought these from three different vendors, so unfortunately the part numbers used throughout the systems are not consistent, but you need to generate some reports showing how part x went into product y and was sold to customer z.

Unfortunately the part numbers are not consistent across the three systems because, as I mentioned, they came from three different vendors. What’s a programmer to do?

This is where conformed dimensions come in handy. In the part dimension table you create a surrogate key. This is the new primary key for a part, which is simply a made up value. Maybe you chose to use a GUID, or perhaps it’s just an auto incrementing integer. Regardless, this is now your new “part number” for all 3 systems once you bring the data in the warehouse.

You would add three more fields to the part dimension table. In addition to the primary key you would have a field “saleskey”, a field “productionkey”, and finally a “purchasingkey”. Then, when bringing your sales data into the warehouse, you look up the saleskey in the dimension table, get the primary key for the part, and place it in the fact sales table.

Repeat with production and purchasing systems. By now you are beginning to get the idea. Because you have conformed the part key across the three fact tables, you can now draw reports using the new part key as a common thread to join the various fact tables together.

This process is known as a conformed dimension. ALL of your dimensions in your warehouse need to be conformed if you want to truly leverage the power of your warehouse. Employees, parts, customers, and locations are just a few examples of dimensions you’d want to conform.

As you can see, having conformed dimensions is key to the success of your warehouse. Failure to conform your dimensions means you loose one of the most powerful features of warehousing, the ability to produce reports across differing systems.

Dimensions versus Facts in Data Warehousing

I’ve made mention before of a large data warehousing project I’ve been involved with, using the SQL Server 2005 tools. Like a lot of developers on a lot of projects, I was thrown in the deep end and had to learn a lot of new technologies in a short order of time. Fortunately I’m a glutton for punishment.

I found a lot of material on the various tools, and quickly became competent in their use. I also quickly got up to speed on star schema versus snow flake schema, and the divvying up of data into facts and dimensions.

The thing that always puzzled me was, how do you decide what goes into facts versus what goes into your dimensions. Having a short breather I decided to do some reading up to make sure my understanding of theory was up to my understanding of the tools.

The best explanation I’ve found is in “The Microsoft Data Warehouse Toolkit” by Joy Mundy and Warren Thornthwaite. http://shrinkster.com/r4n (Good book, highly recommend it, and standard disclaimer I don’t get any kickbacks so buy it where ever you want.) It covers the Kimball method for warehousing. To quote…

It may help to think of dimensions as things or objects. A thing such as a product can exist without ever being involved in a business event.

Ah, grasshopper, enlightenment begins.

As the book describes, a dimension is your noun. It is something than can exist independent of a business event, such as a sale. Products, employees, equipment, are all things that exist. A dimension either does something, or has something done to it.

Employees sell, customers buy. Employees and customers are examples of dimensions, they do.

Products are sold, they are also dimensions as they have something done to them.

Facts, to carry on another concept from the book are the verb. An entry in a fact table marks a discrete event that happens to something from the dimension table. A product sale would be recorded in a fact table. The event of the sale would be noted by what product was sold, which employee sold it, and which customer bought it. Product, Employee, and Customer are all dimensions that describe the event, the sale.

In addition fact tables also typically have some kind of quantitative data. The quantity sold, the price per item, total price, and so on.

Knowing this makes it much clearer in my mind how to start designing my own warehouses. Or at least ready to take the first step.