I’ve been reading a lot of books on SSIS (SQL Server Integration Services) and BI (Business Intelligence) over the course of the year. I want to pass along a little tidbit I haven’t seen in any of them. I’ll preface this by stating our staging tables and data warehouse are all in SQL Server 2005.
Our process is probably similar to others, we pull the data in, and if the warehouse needs to be updated we place the data into a staging table. At the end of the process we do a mass update (via a SQL statement) from the staging table to the main data warehouse tables we use for reporting. Then we delete the records in the staging table. Which seemed like a reasonable thing to do, but wound up getting us in a lot of trouble. Over the course of the last few months our run times for the SSIS job have gotten slower and slower and sloooooooooooooower. Our job was taking as long as 50 minutes to complete sometimes. One of our developers noticed the database seemed to be taking up a lot of space. He found a simple select count(*) was taking eight minutes on what was supposed to be an empty staging table.
Some research on the web explained what we were doing wrong. In one of my favorite SQL Server blogs, I want some Moore, blogger Mladen Prajdic has a great article on the differences between delete and truncate.
The solution then was to not perform a delete, but a truncate on our staging tables. We went ahead and manually issued a truncate on our staging tables, and saw an immediate beneift. Our average run time went from 50 minutes to 8 minutes!
I’m not sure why I haven’t seen this mentioned before, perhaps I just haven’t read the right blog or book yet. But I wanted to pass this along so you could be spared some of the headaches we went through. If your SSIS uses SQL Server 2005 tables, use a truncate and not a delete to avoid speed issues. Alternatively, at least make sure to run truncates on a regular basis to keep those staging areas cleaned out!