Category Archives: SSIS

SQL Server Staging Tables – Truncate versus Delete

I’ve been reading a lot of books on SSIS (SQL Server Integration Services) and BI (Business Intelligence) over the course of the year. I want to pass along a little tidbit I haven’t seen in any of them. I’ll preface this by stating our staging tables and data warehouse are all in SQL Server 2005.

Our process is probably similar to others, we pull the data in, and if the warehouse needs to be updated we place the data into a staging table. At the end of the process we do a mass update (via a SQL statement) from the staging table to the main data warehouse tables we use for reporting. Then we delete the records in the staging table. Which seemed like a reasonable thing to do, but wound up getting us in a lot of trouble. Over the course of the last few months our run times for the SSIS job have gotten slower and slower and sloooooooooooooower. Our job was taking as long as 50 minutes to complete sometimes. One of our developers noticed the database seemed to be taking up a lot of space. He found a simple select count(*) was taking eight minutes on what was supposed to be an empty staging table.

Some research on the web explained what we were doing wrong. In one of my favorite SQL Server blogs, I want some Moore, blogger Mladen Prajdic has a great article on the differences between delete and truncate.

http://weblogs.sqlteam.com/mladenp/archive/2007/10/03/SQL-Server-Why-is-TRUNCATE-TABLE-a-DDL-and-not.aspx

The solution then was to not perform a delete, but a truncate on our staging tables. We went ahead and manually issued a truncate on our staging tables, and saw an immediate beneift. Our average run time went from 50 minutes to 8 minutes!

I’m not sure why I haven’t seen this mentioned before, perhaps I just haven’t read the right blog or book yet. But I wanted to pass this along so you could be spared some of the headaches we went through. If your SSIS uses SQL Server 2005 tables, use a truncate and not a delete to avoid speed issues. Alternatively, at least make sure to run truncates on a regular basis to keep those staging areas cleaned out!

Advertisements

SQL Server Integration Services and the “The script files failed to load” Error

I have an SSIS job that has been running on our test server since April, with no issues. Last week it just quit working. About the fourth package in I started getting “The script files failed to load” error when loading a very simple VB Script inside a package.

A web search led me to this forum page:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=267047&SiteID=1

To save you from wading through all the messages, the two causes of this error could be 1) Package’s PreCompile was set to False, or 2) I had breakpoints in the package.

So I check, and nope my PreCompile was set to true (the default) and there were no breakpoints. In addition, most of the people posting in the forum link above couldn’t get their packages to run in the first place, this didn’t seem to occur in packages like mine that had been running fine for a while.

Next I thought, OK what about environmental issues? Mining our extensive logs I was able to determine the time when one moment the scripts work, then 15 minutes later they no longer worked.

With that exact 15 minute interval known, the DBA team reviewed what had happened on the server, and sure enough a security patch for the .Net 1.1 Framework had been applied on the server during that exact time. Bingo!

It took us 2 days to work through the issue, the team, and I have to be honest and include myself here, had the thought process “well, the error is just on the test server, we’ll get to it when we can.” Since it was just a test server, and we weren’t testing anything at the time, we weren’t stressing over it.

We should have been. The very same patch was scheduled to be rolled to our production server just an hour and a half after we found it. Had that scheduled deployement occurred we would have been in deep diaper filling.

Let me make a quick distinction here, in that it’s not necessarily fixing the issue that’s urgent, but identifiying it. Once you have it identified, you can control it. By not quickly identifying the issue, we nearly rolled the same patch into production.

So the first point here, if your SSIS package which has been running for a while suddenly starts getting “The script files failed to load” error, check to see if any .Net Framework 1.1 patches were applied.

Second, this shows you how important good logging can be. With our extensive logging I was able to determine with a fifteen minute window when the problem occurred.

Finally, never treat issues that occur on your test server as if they are not important. We were quite lucky, narrowly avoiding disaster on our production server. Identify the cause quickly, so you can control it.

Update: We found a fix that worked, see https://arcanecode.wordpress.com/2007/07/18/ssis-issue-from-monday/ for more detail.

Arcane Thoughts: Thinking Inside the Box

Today I’m at an offsite meeting, talking about a new project. I won’t get into too many specific details, but we have to pull data from a web service and update an Oracle database. We can use a vendor provided Java API that runs on a Unix box to do the updates, or we can write to the database directly as long as we handle integrity issues.

So we spent the day brainstorming, to come up with possible solutions. Here is the list of contenders:

  • Write a Java app that runs on Unix that uses the vendor API’s.
  • Write a Java app that runs on Unix and updates the database directly.
  • Write a C# app that runs on a Windows Server, where a Batch Scheduler will kick it off.
  • Write a C# app that runs as a Windows Service under XP (we haven’t taken the Vista plunge at work yet).
  • Write a SQL Server Integration Services package that is run by the SQL Server job scheduler. It will use the web service as the input and update Oracle.
  • Use one of the above methods to pull the data then let BizTalk process it from there.

We haven’t made a decision yet, and my point was not so much to talk about the pro’s and cons of each solution. Instead it’s to get you to think creatively when it comes to new solutions for your company. Sitting down and cranking out yet another C# or VB.Net app may not always be the best approach. You may have a task you can accomplish with less code by using SQL Server Integration Services. Or maybe BizTalk might fit the bill.

All too often as programmers our first answer to any solution is to pull up Visual Studio and start grinding out code. Take some time though, to explore a few other options. There’s a rich set of tools out there, and sometimes the best solution to a programming problem may not be programming.

Arcane Solutions: A Data Warehouse

Tomorrow I’m doing a presentation for Microsoft, in conjunction with the vendor that we outsourced the project with, CTS (http://askcts.com). Our solution for the data warehouse was pretty cool, and I thought I’d briefly mention the technologies involved.

A SQL Server Integration Services solution reads data from our Oracle transaction system, and places the data into a SQL Server 2005 warehouse. This SSIS job runs every fifteen minutes, in what we call a “right time” scenario.

Next, the users first go to a SharePoint portal site. We have a master site for the company, the sub sites for each plant, then within those for each department. Our SharePoint site holds some report viewer SharePoint web parts, which hold SQL Server Reporting Services reports that act as our KPI (Key Performance Indicator) reports.

Also on the page is a special SharePoint web part we custom wrote in C# using the .Net 1.1 framework. This part reads a list of valid reports from a SQL Server 2005 table, then launches an ASP.Net 2.0 site.

The ASP.Net page, written in C# on the 2.0 framework, primarily houses a Microsoft report viewer control. It also has a drop down combo which holds a list of saved queries. These allow users to quickly pick combinations of stored report filters to apply to the reports.

We’ve also given users the ability to create ad-hoc reports by linking in the Report Builder control that’s part of the SQL Server 2005 suite. This lets the users create their own reports, without the necessity of providing ODBC connection info as we would with say Access.

I certainly don’t think Microsoft is perfect, as I’ve written before they certainly have room in some areas for improvement. But when it comes to their business applications, they are outstanding. I don’t know of any other company that puts out a suite of tools that integrate so nicely together, and provide the users such a complete, seamless solution.

SSIS and the Metadata Out of Sync Error

Worked late tonight on a problem we’ve been having with our SSIS (SQL Server Integration Services) package. It would run, but give us a weird error about the metadata being out of sync.

It turned out to be a self inflicted error. In the SQL that drove this particular package, the statement ended with …table1.field3, nvl(table2.field4, ‘Missing’) from table1…. Turns out SSIS didn’t like the nvl part. (nvl by the way is an Oracle statement that equates to if field4 is null, return ‘Missing’, otherwise return field4.) Oh, it understood the nvl command OK, but it didn’t like the fact we failed to alias the command in the SQL.

When I updated the SQL to read nvl(table2.field4, ‘Missing’) as field4 from…, AND I did a build on the package, SSIS was happy. Note though that it kept reporting the problem until I did another build on the package, it drove me nuts for a bit until I thought to rebuild the thing.

To sum it up, never leave just an expression in your SQL, always alias it or you too could fall victim to the mysterious Metadata out of Sync Error.

Arcane Searches

I thought it was time for another round of Arcane Searches. In case you missed the last round, these are some of the search terms users entered into their search engine and managed to run across my blog. I’ll try to pick out a few that seem to occur frequently and answer them. Let’s get started.

ssis package not reading environment variables

There is a policy setting on the server that dictates whether or not the SQL Server job engine can access environment variables.

virtual pc exit fullscreen

To exit full screen mode, use the Right ALT+ENTER combination. Note that the left ALT key won’t work, you must use the right ALT key, with the Enter key.

connectstring SQL compact, connectionstring sql compact edition

These two show up frequently, the connection string for SQL Server Compact Edition (SSCE) is:

Datasource=”C:\MyData\myFile.sdf”; Password=’mypassword’

Note a few things, first the password part is optional. Second, the password must be enclosed in single quotes. Finally, the path to your sdf data file must be in double quotes.

wrap a string in double quotes c#

I’m guessing the search here is how to embed a double quote in a string, that’s easy. Just use a backslash in front of the double quote, and C# will interpret it as a literal character and not a double quote.

string myvar = “Right before he became unconscious, he said \”Yes dear, those pants to make you look fat.\”. “

virtual pc 2007 different ip addresses

Each Virtual PC, be it 2004 or 2007, appears as it’s own PC to the network. Your network hub can’t tell the difference between a virtual machine and a real one, and assigns each machine it’s IP address.

why would I use an interface C#

Why not? OK, sorry, couldn’t resist. Short answer, an interface gives you a way to treat many kinds of objects as if they were all the same kind of object. For example, if you had manager, field worker, office worker, and executive objects, you could have all of the implement the employee interface, then you could treat them all like generic employees to do things like pay, give vacation, etc.

For a more detailed instruction, see my series of posts which begin at https://arcanecode.wordpress.com/2007/02/13/more-oop-interfaces-in-c-part-1/ or http://shrinkster.com/mjk.

There you go, a quick selection of the most common searches I found, that were not already answered in my previous blog entries.

SSIS, BIDS, and TNSNAMES

Today’s alphabet soup has to deal with an issue we struggled over this week at work. For those unfamiliar, SSIS is SQL Server 2005 Integration Services. It’s the rewrite of SQL Server 2000 DTS (Data Transformation Services). You use SSIS to pull data from one database and put it into another.

You develop SSIS in BIDS, Business Intelligence Developers Studio. BIDS is Visual Studio with the BI tools added in. The issue we’ve been fighting had to do with getting our SSIS packages to use Oracle 10g drivers to talk to the Oracle 9i data source that was the beginning point for our packages.

Over and over we got an error that the OLEDB provider could not “Acquire” the database, yet the data source kept passing the “Test Connection” test. It was one of my associates who came upon the clue.

In our environment, we don’t normally use a TNSNAMES.ORA file. TNSNAMES is where Oracle normally stores information about a schema name. Schema names are kind of like URLs. You use a human readable schema name, something like WAREHOUSE1. Oracle driver then goes into TNSNAMES to find the server information so it knows where to route the data to.

As I said, in our environment we don’t use TNSNAMES, instead we have another file. The data driver looks there to get the server that TNS info is stored at for the whole company. This allows us to have a single place to store (and update) TNS data instead of thousands of workstations.

The problem was, BIDS didn’t like that. Even though we had it redirected, BIDS didn’t recognize that and kept going to the TNSNAMES file to find schema info, and in our case it couldn’t find it because it wasn’t there. Once we added a TNSNAMES file BIDS was able to acquire it’s schema info, connect to the database and get whatever data it needed.

The moral of the story is, if you are having problems in BIDS connecting to an Oracle data source verify that you do indeed have a valid TNSNAMES.ORA file.