Dimensions versus Facts in Data Warehousing

I’ve made mention before of a large data warehousing project I’ve been involved with, using the SQL Server 2005 tools. Like a lot of developers on a lot of projects, I was thrown in the deep end and had to learn a lot of new technologies in a short order of time. Fortunately I’m a glutton for punishment.

I found a lot of material on the various tools, and quickly became competent in their use. I also quickly got up to speed on star schema versus snow flake schema, and the divvying up of data into facts and dimensions.

The thing that always puzzled me was, how do you decide what goes into facts versus what goes into your dimensions. Having a short breather I decided to do some reading up to make sure my understanding of theory was up to my understanding of the tools.

The best explanation I’ve found is in “The Microsoft Data Warehouse Toolkit” by Joy Mundy and Warren Thornthwaite. http://shrinkster.com/r4n (Good book, highly recommend it, and standard disclaimer I don’t get any kickbacks so buy it where ever you want.) It covers the Kimball method for warehousing. To quote…

It may help to think of dimensions as things or objects. A thing such as a product can exist without ever being involved in a business event.

Ah, grasshopper, enlightenment begins.

As the book describes, a dimension is your noun. It is something than can exist independent of a business event, such as a sale. Products, employees, equipment, are all things that exist. A dimension either does something, or has something done to it.

Employees sell, customers buy. Employees and customers are examples of dimensions, they do.

Products are sold, they are also dimensions as they have something done to them.

Facts, to carry on another concept from the book are the verb. An entry in a fact table marks a discrete event that happens to something from the dimension table. A product sale would be recorded in a fact table. The event of the sale would be noted by what product was sold, which employee sold it, and which customer bought it. Product, Employee, and Customer are all dimensions that describe the event, the sale.

In addition fact tables also typically have some kind of quantitative data. The quantity sold, the price per item, total price, and so on.

Knowing this makes it much clearer in my mind how to start designing my own warehouses. Or at least ready to take the first step.

Arcane Get-A-Ways

It’s always great to get away with your coworkers every so often, in a situation away from the office. That’s where I’ve been the last few days, in the mountains of north Georgia. Very pretty country, and our big team building event was a 6 mile canoe ride.  

Unfortunately there was no internet access in the lodge we were in. Seems it was being renovated and they hadn’t gotten around to hooking it back up. They barely got the new carpet installed before we arrived. (I was the first one there, and as I was walking in the carpet layer was driving away).

It was a weird experience being internetless for several days. It’s amazing how used to connectivity we’ve all become in such a short time.

Speaking of connectivity, I now have a Pownce account. http://pownce.com/arcanecode/ . I’ve only just started it, will try to figure out how to most effectively use it over the next few weeks. Meanwhile if you have a Pownce account feel free to send me a friend invite.

I also read that WordPress now has a good integration with Facebook, and am considering a Facebook account. Let me know your thoughts.

Finally, I’ve updated the Arcane Lessons page ( https://arcanecode.wordpress.com/arcane-lessons/ ) with more lesson plans, be sure to check it out.

SSIS Issue From Monday

Just a quick update today, due to being swamped with meetings and travel. We managed to resolve the SSIS issue from Monday by applying SQL Server 2005 Service Pack 2.  So if you encounter a “Scripts failed to load” error, verify which service pack your SQL Server is on.

WOW!

With taking vacation the first week of the month, and having three different user group meetings last week (one of which I presented at) I’ve gotten a little behind in my podcast listening. After work yesterday I was cutting my grass and using the time to finally get up to date on my listening.

I was putting away the mower when last week’s Run As Radio came on my player. You can imagine my shock when all of a sudden I hear Greg and Richard say my name! I’m surprised I didn’t fall over right there in the pile of grass clippings.

Back when they were at Tech Ed I had done a small blog post on the show, which Greg found and you can read at https://arcanecode.wordpress.com/2007/06/15/arcane-fun-fridays-run-as-radio/ . Greg also sent me a nice e-mail.

I took the time to respond, and then to be honest got busy and forgot about it. So it was quite a rush to hear my message read on the air. To address a few quick points about it…

Yes, I do admit I was simplifying a bit when I talked about server admins/DBA and their domains, but I still feel that it’s the developer who has to glue all of those services together. Don’t overlook those guys though, if you work with them they can be a developer’s best friend. I know at our work place I’ve worked quite closely with the DBA group, they are extremely helpful when it comes to things like database design and optimizing queiries.

I also want to reiterate my point that all developers should have at least a basic understanding of the way other areas of IT work. That’s why I was so thrilled when Pwop came out with Run As Radio, it is a great resource to fill in that gap in my education. Now if we could just get them to do a show on SQL Server!

If you want to listen to the show yourself, it’s show number 14:

http://www.runasradio.com/default.aspx?showNum=14

Finally I’d like to announce a new feature of the blog, Arcane Lessons. Because of the technical nature of my writing, a lot of topics get broken up over multiple posts, over several days. It can make it a bit difficult to make sure you’ve gotten everything.

To solve that I’ve added a new page I’m calling Arcane Lessons. It lists the topic, then all the posts I’ve done for that topic. Take a look by clicking the “Arcane Lessons” link at the top of the page, or add https://arcanecode.wordpress.com/arcane-lessons/ to your favorites.

Currently I have three up there now, “Getting Started with SQL Server 2005 Full Text Searching”, “Windows Services in C#”, and “Event Logging”. Be sure to check back, I have many more I’ll be adding over the next few weeks such as the “SQL Server Compact Edition” series, or the “Installing Ubuntu under VirtualPC” series.

SQL Server Integration Services and the “The script files failed to load” Error

I have an SSIS job that has been running on our test server since April, with no issues. Last week it just quit working. About the fourth package in I started getting “The script files failed to load” error when loading a very simple VB Script inside a package.

A web search led me to this forum page:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=267047&SiteID=1

To save you from wading through all the messages, the two causes of this error could be 1) Package’s PreCompile was set to False, or 2) I had breakpoints in the package.

So I check, and nope my PreCompile was set to true (the default) and there were no breakpoints. In addition, most of the people posting in the forum link above couldn’t get their packages to run in the first place, this didn’t seem to occur in packages like mine that had been running fine for a while.

Next I thought, OK what about environmental issues? Mining our extensive logs I was able to determine the time when one moment the scripts work, then 15 minutes later they no longer worked.

With that exact 15 minute interval known, the DBA team reviewed what had happened on the server, and sure enough a security patch for the .Net 1.1 Framework had been applied on the server during that exact time. Bingo!

It took us 2 days to work through the issue, the team, and I have to be honest and include myself here, had the thought process “well, the error is just on the test server, we’ll get to it when we can.” Since it was just a test server, and we weren’t testing anything at the time, we weren’t stressing over it.

We should have been. The very same patch was scheduled to be rolled to our production server just an hour and a half after we found it. Had that scheduled deployement occurred we would have been in deep diaper filling.

Let me make a quick distinction here, in that it’s not necessarily fixing the issue that’s urgent, but identifiying it. Once you have it identified, you can control it. By not quickly identifying the issue, we nearly rolled the same patch into production.

So the first point here, if your SSIS package which has been running for a while suddenly starts getting “The script files failed to load” error, check to see if any .Net Framework 1.1 patches were applied.

Second, this shows you how important good logging can be. With our extensive logging I was able to determine with a fifteen minute window when the problem occurred.

Finally, never treat issues that occur on your test server as if they are not important. We were quite lucky, narrowly avoiding disaster on our production server. Identify the cause quickly, so you can control it.

Update: We found a fix that worked, see https://arcanecode.wordpress.com/2007/07/18/ssis-issue-from-monday/ for more detail.

Being a Better Developer… In 6 Months

Scott Hanselman’s show this week was killer. (http://www.hanselminutes.com/default.aspx?showID=90 ). In it, Scott and Carl discuss a thread going around the internet, namely how to become a better developer in six months. They had some excellent ideas, some of which they were passing along from other posters, some were theirs. The post that seems to have started the whole thread was done buy a guy named Justice Gray, back in April. http://graysmatter.codivation.com/HowIAmBecomingABetterDeveloperPart1OfInfinity.aspx or http://shrinkster.com/qvx . Just recently he posted a follow up at http://graysmatter.codivation.com/AnUpdateOnGoals.aspx or http://shrinkster.com/qvy .

I like the whole concept, and am going to implement my own version of it. And the first step is to declare what I’m going to do, so without further ado…

I’m going to start by reading a chapter a week from a book. Now, I know that doesn’t sound like much compared to the plans of others to read a book a week, but it leads to my next step…

I will work all the code samples in the book. Reading is one thing, but doing is even better. Personally, I find I get a better understanding when I actually type in the code samples and run them. And not just run what’s in the book, but tweak it, experiment with it. And then what will I do with my knowledge?

I will teach what I learn. The best way to learn is to teach. I’ll blog, talk with my co-workers over lunch, give presentations, but in some way I will give back what I learned. But I won’t stop this learning process with just books.

I’ll increase my listening of podcasts or videocasts. I recently got an inexpensive MP3/WMA player, which I load up with podcasts. This left the 1 gig card on my iPaq free, which I’ve loaded some videos on. Since the iPaq is portable, it increases my ability to watch these videos. Since I’ve blogged so much about podcasts in the past, I shan’t continue talking about them.

I’ll create at least one new presentation and give it to a user group. Again, the best way to learn is to teach, and there’s no better place than with your peers at your local user group.

I’ll look at the source code for an open source project. This is one I really loved from the show. Look at someone else’s code, see how it works, step through it. Right now I’ve got several in mind, first is the Paint.Net project ( http://www.getpaint.net/index2.html ) since I think the graphics would be interesting, and graphics aren’t something I normally get to play with at work.

Next is RSSBandit, http://www.rssbandit.org with the source at http://sourceforge.net/project/showfiles.php?group_id=96589&package_id=103276 .The networking concepts in there should be quite useful in many instances. Finally is SharpDevelop (http://www.icsharpcode.net/OpenSource/SD/ ). It’d be interesting to see how an IDE works. I’m not sure which of the three I’ll look at, but these are on my short list.

I will learn a brand new or little used technology. There’s a lot of new technology out there, or tech I don’t read much about. Working a lot with the SQL Server BI (Business Intelligence) tools, the upcoming SQL Server 2008 sounds interesting. Of course there’s Visual Studio 2008. And XAML promises to be a hot topic, between WPF and Silverlight I think this will eventually be a “must” for everyone. Those are just some examples, find something that fascinates you and go learn.

The final two items on my list are suggestions from my manager, who my kids have dubbed “Mighty Mike”. I thought these were really good.

I will learn more about the business. No, not the business of programming, although that’s certainly important. I’m talking about what my company does. Most developers aren’t in a job where their company produces software. Instead our programming efforts help support the production of some product, which our company sells. I will learn more about that product, how it’s produced, what processes apply, and what the difficulties are. And finally….

I will get to know my customers. By that, I mean the people who are using, or are affected by the software I write. For most of us, those will be other employees of our company. Meet these people. Get to know them. Buy them a cup of coffee. Take a non-IT coworker to lunch every so often. Setup a half hour meeting with them every so often to learn and understand more about their job. Find out what their pain points are, find ways to solve their problems, offer them solutions to make their jobs better.

Whew, that’s quite the list. It will take a lot of balancing of my time to carry this off, but at the end of it I’ll be a better programmer.

To wrap this up, I’m supposed to tag four other developers, to challenge them as well. So here goes…

First is Jeff Barnes, http://jeffbarnes.net/portal/blogs/jeff_barnes/default.aspx . Payback time! (He knows why, heh heh heh).

Next is Todd Miranda, Birmingham’s newest MVP. Congrats Todd! http://blog.nxtdimension.com/

I think my next victim will be my brother-in-law, Dougal. Even though he’s not fortunate enough to work with .Net, he at least got to do some cool stuff with WordPress. http://dougal.gunters.org/

My final pick is that perfect blend of lunatic and coding genius, Mark Miller. It’s his fault I got deeply involved in coding again. I was thinking of getting into project management, but after seeing his talk at VSLive 2005 I got so enthused about coding again I jumped in with both feet and here I am. So how about it Miller, put down that McGriddle and blog something! http://www.doitwith.net/

An now I challenge you, the reader of this post to go out and be a better developer. Post a link to your development plan. If you don’t have a blog of your own, feel free to post your plan below. Look at others, take the best of the ideas that will work for you.

Now if you’ll excuse me, I’ve got a lot of work to do!

The Great Font Hunt

My brother in law, Dougal has an interesting post about typefaces for webpages. ( http://dougal.gunters.org/blog/2007/07/05/typography-design-patterns-for-the-web ). You may have seen color charts for web page design, where it lists a primary color and other colors that compliment it. What Dougal is looking for is a similar chart for fonts. Anyone know of anything? If so post a comment on his blog or mine (or both!) and let us know.