Arcane Reasons for Data Warehousing

I may have mentioned that of recent I have been doing a lot of work in the Data Warehousing arena. Today I met with some IT folks from another branch of the company who are considering a reporting strategy for their area. One of the people I was meeting with asked me “With so much data available, how do you decide what data to put in the warehouse first, versus what data do you leave in the application, either permanently or until a later point?

Great question, and I thought that you too might be interested in the answer.

Interapplication Reports. Historically trying to combine data from multiple applications has been painful, to put it nicely. Clearly than this turns out to be one of the most compelling reasons for data warehousing, to house data from multiple applications and allow users to easily combine that data into singular reports.

Phasing Out Historical or Ad-Hoc Systems. Accounting systems seem to have an existence all their own. For various reasons they live well beyond their normal lifespan. We have a system at work, written in an old DOS based reporting tool that dates back to the late 1980’s. Over the years it’s been used to do reporting from other systems. As it turns out it has some issues with Vista, and will need replacing. Rather than getting yet another system, we plan to replace its reports with ones from our data warehouse.

Friendlier Reporting. Often when I see databases, the field names are quite cryptic. Names like fklnam (foreign name last name) and accsbcd (account sub code) litter databases. It’s difficult enough for IT Professionals to decipher the field name mayhem, but asking users to do so just to create a few ad-hoc reports can be asking far too much. Not to mention the sometimes bizarre seeming relationships between tables.

Moving to a data warehouse allows you to give much saner, user friendly names to your data. In addition you can flatten out some of the tables, simplifying the relationship structures significantly.

Production Server Load Reduction. Production systems are usually optimized for dealing with single records at a time. As a result, searching through and retrieving data for large quantities of data can be resource intensive on the production system. Shifting reporting to a warehouse means a reduced load for the production system. In addition you elimante the chance that malformed SQL from some ad-hoc query can cripple your production system.

Ease of Offline Maintenance for Production Systems. Finally, having a warehouse makes it easier to take production systems offline for maintenance. If users know they can still get to their data via the warehouse, they will be less concerned about their production system going offline for work, which in turn makes it easier to schedule such work. If you have a system that requires frequent maintenance, your users will be less likely to give you grief if their data is available elsewhere.

Those are my primary ways in which we decide which data is targeted for inclusion into the warehouse. If you need to combine data from multiple applications, have older systems that need replacement, have cryptic field names or complex table relationships, need to reduce the load on your production server, or have systems that need frequent maintenance then consider those systems first for inclusion into a data warehouse.

I’d be curious to hear your comments on your strategies for determining inclusion into your own data warehouse.


3 thoughts on “Arcane Reasons for Data Warehousing”

  1. In my experience, the most significant motivation for a data warehouse is for the purpose of phasing out historical data from your production database. I was recently involved with a large system where some of the most important tables were introduced to tens of millions of new records per month. For some people, I am sure this sounds miniscule. However, it has the potential to become a headache rather quickly in a system with extremely heavy user load. And, it doesn’t help matters that the database is also tapped for adhoc reporting. After the system went live, one of the top priorities was the development of a data warehouse to ease the burden on the production database. So, I definitely agree with you on those points.

    Furthermore, I can empathize with the issue of poorly named database fields. The system that I just mentioned replaced a legacy system with an extremely poor data model. Most of the tables contained fields called sdata1, sdata2, sdata3 (string data), and so on as well as ndata1, ndata2, ndata3, (numeric data) etc. The numbers may go as high as 15 or 20. Apparently, the team that originally wrote the legacy system decided it would be a clever way to avoid modifying the database when additional fields were required. Naturally, there was never any documentation to record what actually went in those fields. It almost required a rosetta stone to figure out the purpose of each field.

    At any rate, I think all of your ideas are sound.

    Just my two cents…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s