I may have mentioned that of recent I have been doing a lot of work in the Data Warehousing arena. Today I met with some IT folks from another branch of the company who are considering a reporting strategy for their area. One of the people I was meeting with asked me “With so much data available, how do you decide what data to put in the warehouse first, versus what data do you leave in the application, either permanently or until a later point?
Great question, and I thought that you too might be interested in the answer.
Interapplication Reports. Historically trying to combine data from multiple applications has been painful, to put it nicely. Clearly than this turns out to be one of the most compelling reasons for data warehousing, to house data from multiple applications and allow users to easily combine that data into singular reports.
Phasing Out Historical or Ad-Hoc Systems. Accounting systems seem to have an existence all their own. For various reasons they live well beyond their normal lifespan. We have a system at work, written in an old DOS based reporting tool that dates back to the late 1980’s. Over the years it’s been used to do reporting from other systems. As it turns out it has some issues with Vista, and will need replacing. Rather than getting yet another system, we plan to replace its reports with ones from our data warehouse.
Friendlier Reporting. Often when I see databases, the field names are quite cryptic. Names like fklnam (foreign name last name) and accsbcd (account sub code) litter databases. It’s difficult enough for IT Professionals to decipher the field name mayhem, but asking users to do so just to create a few ad-hoc reports can be asking far too much. Not to mention the sometimes bizarre seeming relationships between tables.
Moving to a data warehouse allows you to give much saner, user friendly names to your data. In addition you can flatten out some of the tables, simplifying the relationship structures significantly.
Production Server Load Reduction. Production systems are usually optimized for dealing with single records at a time. As a result, searching through and retrieving data for large quantities of data can be resource intensive on the production system. Shifting reporting to a warehouse means a reduced load for the production system. In addition you elimante the chance that malformed SQL from some ad-hoc query can cripple your production system.
Ease of Offline Maintenance for Production Systems. Finally, having a warehouse makes it easier to take production systems offline for maintenance. If users know they can still get to their data via the warehouse, they will be less concerned about their production system going offline for work, which in turn makes it easier to schedule such work. If you have a system that requires frequent maintenance, your users will be less likely to give you grief if their data is available elsewhere.
Those are my primary ways in which we decide which data is targeted for inclusion into the warehouse. If you need to combine data from multiple applications, have older systems that need replacement, have cryptic field names or complex table relationships, need to reduce the load on your production server, or have systems that need frequent maintenance then consider those systems first for inclusion into a data warehouse.
I’d be curious to hear your comments on your strategies for determining inclusion into your own data warehouse.