Using SSIS Package Configuration Values as Parameters to SQL Tasks

Last weekend I presented at SQL Saturday in Redmond WA. One attendee asked if it was possible to use values from the Package configuration file as parameters to a SQL Task. The answer was a definite yes, although I didn’t have a good example. This post aims to fix that.

Let’s start with the basic workflow. First you will need to create a variable, this variable will be used to pass the value from the config file to the SQL statement. Next you will need to establish a package configuration file, to hold the configured value. After that you will create an Execute SQL Task. In the input statement for the task, simply use ? for each parameter in the T-SQL statement. Finally you’ll click on the Parameter Mapping area of the SQL Task and map the variable to the position of each ?.

Now that you have a basic overview of where we’re going, let’s start this example by creating a simple table to update. Pick your favorite database, or create a test one, then run a simple script to create a table. I used my ArcaneCode database, which I use for samples, and created a table named TestParamToProcs in the dbo schema.

USE [ArcaneCode]
GO

/****** Object:  Table [dbo].[TestParamToProcs]    Script Date: 10/06/2009 21:45:19 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[TestParamToProcs](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [SomeText] [nvarchar](50) NULL,
CONSTRAINT [PK_TestParamToProcs] PRIMARY KEY CLUSTERED
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Now let’s create a new package. I started by opening BIDS (Business Intelligence Developer Studio), creating a solution and named it ConfigFileToSqlParams. I added a Shared Data Source to connect to my database. Next I renamed the default package to ParamToProcs.dtsx. I added the shared data source to the connection managers area of the package.

OK, that takes care of the basics. Next we need to setup the package for variables and a config file. Open up the variables window (click in the Control Surface area, and in the SSIS menu pick Variables.) Create a new variable, we’ll call it MyParam. Make it a string and insert the value of “X” into it.

Now right click on the Control Flow surface (all that yellow area in the Control Flow tab) and pick Package Configurations. Click the check box to “Enable package configurations”, then click the Add button at the bottom of the screen. You are now presented with the Package Configuration Wizard. The first screen is just a welcome screen, you can click past it. The next screen is the “Select Configuration Type” screen. Make sure the default of “XML configuration file” is selected in the drop down. Next under “Specify configuration files directly” enter the path and file name for your config XML file. I named mine ParamToProcsConfig.xml and clicked Next.

The next step in the wizard is the “Select Properties to Export” screen. Expand the Variables area in the tree, then the tree for MyParam, then Properties. Now make sure the only box checked is the one for “Value”.

1_PropsToExport

Click Next once your screen looks like the one above, and you’ll be at the completion screen. Give your package config a good name, I used MyParamConfig, and clicked Finish.

Now let’s drop 3 SQL Tasks into the Control Flow surface. The first task will simply delete all the rows in the table that was just created. That way we can run the package repeatedly. For all three tasks set the Connection to the shared source you added in the Connection Managers area. Now for the SQLStatement of our first task, we’ll use Direct Input and enter this SQL statement:

DELETE FROM dbo.TestParamToProcs

For the next SQL task we’ll insert a row into our table to manipulate. Use this for the SQL Statement property.

INSERT INTO dbo.TestParamToProcs (SomeText) VALUES (‘Hello’)

OK, those were fairly easy and straight forward. Now comes the fun part. For the next SQL Task, we’ll use this statement for the SQLStatement property.

UPDATE dbo.TestParamToProcs SET SomeText = ?

Note the use of the ? (question mark) for the value of the SomeText field. This is the parameter. Now we need to supply a value to this parameter. Over on the left, click the Parameter Mapping tab. Click Add and a new row will be placed in the grid. In the first column, Variable Name, from the drop down pick “User::MyParam”. Leave the Direction as “Input”. Since this is a string variable, we’ll need to change the Data Type column to “VARCHAR”. Finally we need to set the Parameter Name, For this we’ll enter a 0 (zero). We’ll leave the Parameter Size at the default of –1.

2_Params

When your dialog resembles this one, you can click OK. If you needed to use more than one parameter, just use ? for each place where a parameter would go in your T-SQL statement. Then use the values of 0, 1, 2 etc for the Parameter Name. 0 will map to the first ?, 1 to the second, and so on. OK at this point your SSIS package should look something like:

3_Package

If you now run the package, then jump into SQL Server Management Studio and run a “SELECT * FROM dbo.TestParamToProcs”, you should see one row with the SomeText column set to X. Great! Now lets give it one more test. We should change the config file, and run the package outside of BIDS.

First, open notepad or some other text editor, then open up the ParamToProcsConfig.XML file. Move to the <ConfiguredValue> tags, and change the X to something else. I used “ArcaneCode” as it was rather spiffy, but you can use your name, or your cat’s name if you like. Save the file and you can close your text editor.

Now let’s use dtexec to run our package. Open up a command window and navigate to the folder where your package is located. I then used the dtexecui utility to create a package execution command. I’d encourage you to play with this, but so we’re all on the same page here is the dtexec command line I came up with:

dtexec /FILE "D:\Presentations\SQL Server\ConfigFileToSqlParams\ConfigFileToSqlParams\ConfigFileToSqlParams\ParamToProcs.dtsx" /CHECKPOINTING OFF  /REPORTING EW

Of course you’ll want to change the path to wherever you have the ParamToProcs.dtsx package. You should be able to run the above command line and execute your package. If you the jump back to SSMS and rerun the “SELECT * FROM dbo.TestParamToProcs”, you should now see the row with the new value from the config file.

To summarize, the basic steps are:

1. Create the variable

2. Store the variable in a package configuration XML file.

3. Create a SQL Task. Use ? for the parameters in the T-SQL statement.

4. Map the variables to the parameters.

5. Run. Be happy.

And that’s all there is to it.

Calling SSIS from .Net

In a recent DotNetRocks show, episode 483, Kent Tegels was discussing SQL Server Integration Services and how it can be useful to both the BI Developer as well as the traditional application developer. While today I am a SQL Server BI guy, I come from a long developer background and could not agree more. SSIS is a very powerful tool that could benefit many developers even those not on Business Intelligence projects. It was a great episode, and I high encourage everyone to listen.

There is one point though that was not made very clear, but I think is tremendously important. It is indeed possible to invoke an SSIS package from a .Net application if that SSIS package has been deployed to the SQL Server itself. This article will give an overview of how to do just that. All of the sample code here will also be made available in download form from the companion Code Gallery site, http://code.msdn.microsoft.com/ssisfromnet .

In this article, I do assume a few prerequisites. First, you have a SQL Server with SSIS installed, even if it’s just your local development box with SQL Server Developer Edition installed. Second, I don’t get into much detail on how SSIS works, the package is very easy to understand. However you may wish to have a reference handy. You may also need the assistance of your friendly neighborhood DBA in setting up the SQL job used in the process.

Summary

While the technique is straightforward, there are a fair number of detailed steps involved. For those of you just wanting the overview, we need to start with some tables (or other data) we want to work with. After that we’ll write the SSIS package to manipulate that data.

Once the package is created it must be deployed to the SQL Server so it will know about it. This deploy can be to the file system or to SQL Server.

Once deployed, a SQL Server Job must be created that executes the deployed SSIS package.

Finally, you can execute the job from your .Net application via ADO.NET and a call to the sp_start_job stored procedure built into the msdb system database.

OK, let’s get to coding!

Setup the Tables

First we need some data to work with. What better than a listing of previous Dot Net Rocks episodes? I simply went to the Previous Shows page, highlighted the three columns of show number, show name, and date, and saved them to a text file. (Available on the Code Gallery site.)

Next we need a place to hold data so SSIS can work with it. I created a database and named it ArcaneCode, however any database should work. Next we’ll create a table to hold “staging” DNR Show data.

CREATE TABLE [dbo].[staging_DNRShows](
  [ShowData] [varchar](250) NOT NULL
) ON [PRIMARY]

This table will hold the raw data from the text file, each line in the text file becoming one row here. Next we want a table to hold the final results.

CREATE TABLE [dbo].[DNRShows](
  [ShowNumber] [int] NOT NULL,
  [ShowName] [varchar](250) NULL,
  [ShowDate] [datetime] NULL,
  CONSTRAINT [PK_DNRShows] PRIMARY KEY CLUSTERED
  (
  [ShowNumber] ASC
  )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  ) ON [PRIMARY]

The job of the SSIS package will be to read each row in the staging table and split it into 3 columns, the show’s number, name, and date, then place those three columns into the DNRShows table above.

The SSIS Package

The next step is to create the SSIS package itself. Opening up Visual Studio / BIDS, create a new Business Intelligence SQL Server Integration Services project. First let’s setup a shared Data Source to the local server, using the ArcaneCode database as our source.

The default package name of “Package.dtsx” isn’t very informative, so let’s rename it ”LoadDNRShows.dtsx”. Start by adding a reference to the shared data source in the Connection Managers area, taking the default. Then in the Control Flow surface add 3 tasks, as seen here:

clip_image001

The first task is an Execute SQL Task that simply runs a “DELETE FROM dbo.DNRShows” command to wipe out what was already there. Of course in a true application we’d be checking for existing records in the data flow and doing updates or inserts, but for simplicity in this example we’ll just wipe and reload each time.

The final task is also an Execute SQL Task, after we have processed the data we no longer need it in the staging table, so we’ll issue a “DELETE FROM dbo.staging_DNRShows” to remove it.

The middle item is our Data Flow Task. This is what does the heavy lifting of moving the staging data to the main table. Here is a snapshot of what it looks like:

clip_image002

The first task is our OLEDB Source, it references the staging_DNRShows table. Next is what’s called a Derived Column Transformation. This will allow you to add new calculated columns to the flow, or add columns from variables. In this case we want to add three new columns, based on the single column coming from the staging table.

clip_image003

As you can see in under Columns in the upper left, we have one column in our source, ShowData. In the lower half we need to add three new columns, ShowNumber, ShowDate, and ShowName. Here are the expressions for each:

ShowNumber
    (DT_I4)SUBSTRING(ShowData,1,FINDSTRING(ShowData,"\t",1))

ShowDate
    (DT_DBDATE)SUBSTRING(ShowData,FINDSTRING(ShowData,"\t",2) + 1,LEN(ShowData) – FINDSTRING(ShowData,"\t",2))

ShowName
    (DT_STR,250,1252)SUBSTRING(ShowData,FINDSTRING(ShowData,"\t",1) + 1,FINDSTRING(ShowData,"\t",2) – FINDSTRING(ShowData,"\t",1) – 1)

The syntax is an odd blend of VB and C#. Each one starts with a “(DT_”, these are type casts, converting the result of the rest of the expression to what we need. For example, (DT_I4) converts to a four byte integer, which we need because in our database the ShowNumber column was defined as an integer. You will see SUBSTRING and LEN which work like their VB counterparts. FINDSTRING works like the old POS statement, it finds the location of the text and returns that number. The “\t” represents the tab character, here the C# fans win out as the Expression editor uses C# like escapes for special characters. \t for tab, \b for backspace, etc.

Finally we need to write out the data. For this simply add an OLEDB Destination and set it to the target table of dbo.DNRShows. On the mappings tab make sure our three new columns map correctly to the columns in our target table.

Deploy the Package

This completes the coding for the package, but there is one final step we need to do. First, in the solution explorer right click on the project (not the solution, the project as highlighted below) and pick properties.

clip_image004

In the properties dialog, change the “CreateDeploymentUtility” option from false (the default) to True.

clip_image006

Now click the Build, Build Solution menu item. If all went well you should see the build was successful. It’s now time to deploy the package to the server. Navigate to the folder where your project is stored, under it you will find a bin folder, and in it a Deployment folder. In there you should find a file with a “.SSISDeploymentManifest” extension. Double click on this file to launch the Package Installation Wizard.

When the wizard appears there are two choices, File system deployment and SQL Server deployment. For our purposes we can use either one, there are pros and cons to each and many companies generally pick one or the other. In this example we’ll pick SQL Server deployment, but again know that I’ve tested this both ways and either method will work.

Once you pick SQL Server deployment, just click Next. Now it asks you for the server name, I’ve left it at (local) since I’m working with this on a development box; likewise I’ve left “Use Windows Authentication”. Finally I need the package path, I can select this by clicking the ellipse (the …) to the right of the text box. This brings up a dialog where I can select where to install.

clip_image007

In a real world production scenario we’d likely have branches created for each of our projects, but for this simple demo we’ll just leave it in the root and click OK.

Once your form is filled out as below, click Next.

clip_image008

We are next queried to what our installation folder should be. This is where SSIS will cache package dependencies. Your DBA may have a special spot setup for these, if not just click next to continue.

Finally we are asked to confirm we know what we are doing. Just click Next. If all went well, the install wizard shows us it’s happy with a report, and we can click Finish to exit.

Setup the SQL Server Job

We’ve come a long way and we’re almost to the finish line, just one last major step. We will need to setup a SQL Server Job which will launch the SSIS package for us. In SQL Server Management Studio, navigate to the “SQL Server Agent” in your Object Explorer. If it’s not running, right click and pick “Start”. Once it’s started, navigate to the Jobs branch. Right click and pick “New Job”.

When the dialog opens, start by giving your job a name. As you can see below I used LoadDNRShows. I also entered a description.

clip_image010

Now click on the Jobs page over on the left “Select a page” menu. At the bottom click “New” to add a new job step.

In the job step properties dialog, let’s begin by naming the step “Run the SSIS package”. Change the Type to “SQL Server Integration Services Package”. When you do, the dialog will update to give options for SSIS. Note the Run As drop down, this specifies the account to run under. For this demo we’ll leave it as the SQL Server Agent Service Account, check with your DBA as he or she may have other instructions.

In the tabbed area the General tab first allows us to pick the package source. Since we deployed to SQL Server we’ll leave it at the default, however if you had deployed to the file system this is where you’d need to change it to pick your package.

At the bottom we can use the ellipse to pick our package from a list. That done your screen should look something like:

clip_image011

For this demo that’s all we need to set, I do want to take a second to encourage you to browse through the other tabs. Through these tabs you can set many options related to the package. For example you could alter the data sources, allowing you to use one package with multiple databases.

Click OK to close the job step, then OK again to close the Job Properties window. Your job is now setup!

Calling from .Net

The finish line is in sight! Our last step is to call the job from .Net. To make it a useful example, I also wanted the .Net application to upload the data the SSIS package will manipulate. For simplicity I created a WinForms app, but this could easily be done in any environment. I also went with C#, again the VB.Net code is almost identical.

I started by creating a simple WinForm with two buttons and one label. (Again the full project will be on the Code Gallery site).

clip_image012

In the code, first be sure to add two using statements to the standard list:

using System.Data.SqlClient;

using System.IO;

Behind the top button we’ll put the code to copy the data from the text file we created from the DNR website to the staging table.

    private void btnLoadToStaging_Click(object sender, EventArgs e)

    {

      /* This method takes the data in the DNRShows.txt file and uploads them to a staging table */

      /* The routine is nothing magical, standard stuff to read as Text file and upload it to a  */

      /* table via ADO.NET                                                                      */

 

      // Note, be sure to change to your correct path

      string filename = @"D:\Presentations\SQL Server\Calling SSIS From Stored Proc\DNRShows.txt";

      string line;

 

      // If you used a different db than ArcaneCode be sure to set it here

      string connect = "server=localhost;Initial Catalog=ArcaneCode;Integrated Security=SSPI;";

      SqlConnection connection = new SqlConnection(connect);

      connection.Open();

 

      SqlCommand cmd = connection.CreateCommand();

 

      // Wipe out previous data in case of a crash

      string sql = "DELETE FROM dbo.staging_DNRShows";

      cmd.CommandText = sql;

      cmd.ExecuteNonQuery();

 

      // Now setup for new inserts

      sql = "INSERT INTO dbo.staging_DNRShows (ShowData) VALUES (@myShowData)";

 

      cmd.CommandText = sql;

      cmd.Parameters.Add("@myShowData", SqlDbType.VarChar, 255);

 

      StreamReader sr = null;

 

      // Loop thru text file, insert each line to staging table

      try

      {

        sr = new StreamReader(filename);

        line = sr.ReadLine();

        while (line != null)

        {

          cmd.Parameters["@myShowData"].Value = line;

          cmd.ExecuteNonQuery();

          lblProgress.Text = line;

          line = sr.ReadLine();

        }

      }

      finally

      {

        if (sr != null)

          sr.Close();

        connection.Close();

        lblProgress.Text = "Data has been loaded";

      }

 

Before you ask, yes I could have used any number of data access technologies, such as LINQ. I went with ADO.NET for simplicity and believing most developers are familiar with it due to its longevity. Do be sure and update the database name and path to the file in both this and the next example when you run the code.

This code really does nothing special, just loops through the text file and uploads each line as a row in the staging table. It does however serve as a realistic example of something you’d do in this scenario, upload some data, then let SSIS manipulate it on the server.

Once the data is there, it’s finally time for the grand finale. The code behind the second button, Execute SSIS, does just what it says; it calls the job, which invokes our SSIS package.

    private void btnRunSSIS_Click(object sender, EventArgs e)

    {

      string connect = "server=localhost;Initial Catalog=ArcaneCode;Integrated Security=SSPI;";

      SqlConnection connection = new SqlConnection(connect);

      connection.Open();

 

      SqlCommand cmd = connection.CreateCommand();

 

      // Wipe out previous data in case of a crash

      string sql = "exec msdb.dbo.sp_start_job N’LoadDNRShows’";

      cmd.CommandText = sql;

      cmd.ExecuteNonQuery();

      connection.Close();

      lblProgress.Text = "SSIS Package has been executed";

 

    }

The key is this sql command:

exec msdb.dbo.sp_start_job N’LoadDNRShows’

“exec” is the T-SQL command to execute a stored procedure. “sp_start_job” is the stored procedure that ships with SQL Server in the MSDB system database. This stored procedure will invoke any job stored on the server. In this case, it invokes the job “LoadDNRShows”, which as we setup will run an SSIS package.

Launch the application, and click the first button. Now jump over to SQL Server Management Studio and run this query:

select * from dbo.staging_DNRShows;

select * from dbo.DNRShows;

You should see the first query bring back rows, while the second has nothing. Now return to the app and click the “Execute SSIS” button. If all went well running the query again should now show no rows in our first query, but many nicely processed rows in the second. Success!

A few thoughts about xp_cmdshell

In researching this article I saw many references suggesting writing a stored procedure that uses xp_cmdshell to invoke dtexec. DTEXEC is the command line utility that you can use to launch SSIS Packages. Through it you can override many settings in the package, such as connection strings or variables.

xp_cmdshell is a utility built into SQL Server. Through it you can invoke any “DOS” command. Thus you could dynamically generate a dtexec command, and invoke it via xp_cmdshell.

The problem with xp_cmdshell is you can use it to invoke ANY “DOS” command. Any of them. Such as oh let’s say “DEL *.*” ? xp_cmdshell can be a security hole, for that reason it is turned off by default on SQL Server, and many DBA’s leave it turned off and are not likely to turn it on.

The techniques I’ve demonstrated here do not rely on xp_cmdshell. In fact, all of my testing has been done on my server with the xp_cmdshell turned off. Even though it can be a bit of extra work, setting up the job, etc., I still advise it over the xp_cmdshell method for security and the ability to use it on any server regardless of its setting.

In Closing

That seemed like a lot of effort, but can lead to some very powerful solutions. SSIS is a very powerful tool designed for processing large amounts of data and transforming it. In addition developing under SSIS can be very fast due to its declarative nature. The sample package from this article took the author less than fifteen minutes to code and test.

When faced with a similar task, consider allowing SSIS to handle the bulk work and just having your .Net application invoke your SSIS package. Once you do, there are no ends to the uses you’ll find for SQL Server Integration Services.

Intro to DW/BI at the Steel City SQL Users Group

Tonight I’ll be presenting “Introduction to Data Warehousing / Business Intelligence” at the Steel City SQL users group, right here in Birmingham Alabama. If you attended my Huntsville presentation last week, I’ve already added some new slides and revised the deck, so it will be worth another look.

My slide deck is IntroToDataWarehouse.pdf . Come join us tonight at 6 pm at New Horizons, there will be pizza and fun for all.

UPDATE: Before the presentation I was showing a video of Sara Ford jumping off a tower to support CodePlex. Got tons of laughs so here’s a link to the video:

http://blogs.msdn.com/saraford/archive/2009/09/14/my-codeplex-jump-from-tallest-building-in-the-southern-hemisphere-the-full-video.aspx

Data Warehousing / BI at the next HuntUG Meeting!

Business Intelligence is one of the most in demand skill sets right now. Do you want to know more about it? Be guided through all the terminology and concepts? Do you live in the Huntsville Alabama area? Well here’s your golden opportunity!

On Tuesday, September 8th I will be presenting “Introduction to Data Warehousing / Business Intelligence” at the next meeting of the Huntsville User Group. I’ll demystify all the terms around DW/BI and give a demonstration of the Microsoft SQL Server tools used in the DW/BI process. See their site for meeting time and directions. 

SSIS For Developers at CodeStock 2009

At the 2009 CodeStock event I am presenting SQL Server Integration Services for Developers. This class will demonstrate tasks commonly done by VB.Net or C# developers within SQL Server Integration Services.

The sample project and documentation for the lab can be found on the code gallery site at http://code.msdn.microsoft.com/SSISForDevs .

Introduction To Data Warehousing and Business Intelligence

At the Atlanta SQL Saturday 2009 one of the presentations I am doing is “Introduction to Data Warehousing and Business Intelligence”.

You can download the slide deck for this presentation in PDF format.

Any sample code came from either my Intro to SSIS presentation or the book Programming SQL Server 2008.

Tracking down SQL Server Integration Services issues with Collation

At work I’ve been developing a big suite of packages to convert data from an Oracle system into our Data Warehouse. A lot of the supporting tables are almost a straight pull, except we are of course adding our own primary key, and then setting up a non clustered unique index on what had been the primary key in the old system.

A few tables have been driving us batty though, giving us “duplicate value” errors when trying to insert the rows from our SSIS package. The first thing I did to try and track down the problem was create an error table, and instead of having the package fail have it redirect error rows to my new error table. In case you are wondering, this is going to be a “one time shot” use for the packages, so we chose not to invest a lot of time and effort into error handling. We either want all the rows or none, and we’ll be running the packages manually so we’ll be there to know the results. But I digress.

When I went to look at the error table, it had all the rows from our source system in it. I scratched my head, thinking that can’t be right. A quick search found the answer in the Technet Forums. I needed to go into the OLE DB Destination and set the Max Commit count to 1. Of course you wouldn’t want to leave it like that for production, but for debugging it worked great. Once I did that, I was able to rerun the package and quickly identify my misbehaving row.

Next I looked at the value, and then looked for a similar value in my table. What I found was my source system had two rows, something like this example:

Arcane Code

Arcane code

Yes, the only difference was the second row had a lowercase letter at the beginning of the second word. Our Oracle instance had case sensitivity turned on. To it, these were two entirely different values. However, by default SQL Server is case insensitive; to it these two were the same. So my dilemma, how to fix this one column without having to alter my entire database?

It turns out there is an option in the Create Table syntax to set the collation. First, you should find out what your collation is currently set to. This is easy enough, just open SQL Server Management Studio, right click on the database and pick properties. Right there on the front page is the Collation.

image

Alternatively I could have run this SQL in SSMS (substitute your database name where I have AdventureWorks2008):

select databasepropertyex('AdventureWorks2008', 'collation')

Either way, in this example the default is SQL_Latin1_General_CP1_CI_AS. The important thing to note is the “_CI_”, which indicates case insensitivity. If we wanted to set the entire database, we would issue commands to change this to SQL_Latin1_General_CP1_CS_AS, which stands for case sensitivity. But as I said, in my case I don’t want to affect the entire database, so instead I will use this collation name in the create table syntax. Here is a simple example:

create table TestTable
(
  BogusPK bigint identity
  , FieldFromOracle varchar(200) collate SQL_Latin1_General_CP1_CS_AS not null
  , AnotherField varchar(200) null
)

All that I had to do was insert the collate clause between the data type and the not null clause. Note that this only affects the one column I had an issue with. FieldFromOracle is now a case sensitive column, I can add “Arcane Code” and “Arcane code” and still be able to add a unique index. The second column, here named “AnotherField” will remain case insensitive, the behavior you normally expect.

Before I wrap this up, I know someone will point out that allowing primary keys in your system that only differ in case is bad practice. For the record I totally agree, however this is a soon to be legacy system built by a vendor. Additionally, for various reasons I was not allowed to do any data cleansing to the source system. Just pull it like it is and put it in the warehouse. I imagine most of you are like me, that you don’t get to live in the ideal world, so hopefully knowing how to diagnose and deal with collation issues between databases will make your life a little easier.

Arcane Review: Expert SQL Server 2005 Integration Services

If you recall my “Good Reads” post from June 25th, you will remember I am a big believer in books as a learning medium. I like to employ a lot of different ways to learn: user groups, blogs, podcasts, videocasts, and magazines to name a few. But for really in depth coverage, it’s hard to beat a nice book in your hands. I got some good feedback from my mention last week of Andy Leonard’s new e-book on Data Dude, so I thought that I would continue by adding book reviews to the blog every so often.

Expert SQL Server 2005 Integration Services For this review I thought I’d cover a book that seems to constantly be on my desk lately: Expert SQL Server 2005 Integration Services, by Brian Knight and Eric Veerman. This book does a really good job and is specifically targeted toward the data warehousing professional. One entire chapter is devoted to ETL for dimension tables; another chapter focuses on the fact tables. It was great to have coverage so focused on these topics.

Another favorite part of the book is the two chapters on deploying and managing SSIS packages. So often these topics are glossed over, especially the managing piece. The book does a great job in covering all the tools and practices around this subject. I’ll mention one more chapter, one that focuses on package reliability. They cover logging, auditing, event handling, checkpoint files, and even suggestions on testing error handling logic.

There are many more chapters in the book, such as migration from DTS (SQL Server 2000) and Scalability, for you to discover. The other thing I love about this book is the brevity. The authors cover an amazing amount of information in just 382 pages. As a busy, busy person I very much appreciate the conciseness they achieved without sacrificing any clarity.

I’ve met both authors, and have heard them speak. They are both very nice, knowledgeable individuals, and I highly encourage you to attend one of their presentations if you get the chance, or if not at least buy their book from your favorite retailer; you will find it a great investment.

Atlanta Code Camp – Introduction to SQL Server 2005 Integration Services (SSIS)

Thanks to everyone who stuck with me during my Saturday morning presentation at the Atlanta Code Camp. For those who didn’t make it, I had come down with either food poisoning or some sort of virus on Thursday night and was extremely sick on Friday. I had recovered enough Saturday to make the camp, even though my voice was just about gone. I promised to post my materials, so without further delay here they are:

First off, here’s the power point slides, in PDF format: Intro to SSIS Slide Deck

Next, here is the script I used to generate my demo: Intro to SSIS Script. If anyone has issues with the directions, please e-mail me and let me know, this is my first pass at this format and I want to ensure it’s usable for everyone.

Finally, here is the project:

SSIS Test 1_zip

I am feeling a bit better today, slept most of Sunday and while I’ve totally lost my voice my fingers still work so I wanted to get this out here.

Thanks to the folks in Atlanta for a great code camp, and thanks again to everyone who attended my session it was a great crowd.

Atlanta Code Camp

I just wanted to let everyone know I will be speaking at the Atlanta Code Camp, on Saturday March 29th 2008. My subject will be Introduction to SQL Server 2005 Integration Services (SSIS). From the preliminary schedule I’ve seen, this looks like an awesome code camp, eight tracks this time! The SQL Server track concentrates a lot on BI (Business Intelligence). Three of the five sessions are on SSIS, so if you are looking to learn more about this subject this is the place to be!

The camp fills up quick, I’m surprised it hasn’t reached it’s limit yet. Just a little over a week away so head over to their site and register now!

Introduction to SQL Server Integration Services – Huntsville Alabama Code Camp

My second post of the day at Alabama Code Camp 6 in Huntsville is “Introduction to SQL Server Integration Services”.

The slide deck is here: Intro to SSIS Slide Deck

The “cheat sheet” or script I used to do the demo is here: Script for doing the SSIS Demo You can step through it to recreate all of the things I did in the demo today.

Finally here is the finished project. It’s actually zipped, but my current host doesn’t like zip extensions so when you download it change the extension from txt back to ZIP. Finished SSIS Project

Don’t Uninstall Visual Studio 2005 Yet!

One of the great benefits of Visual Studio 2008 is the ability for it to target multiple .Net Frameworks. This means, in theory you could go ahead and begin using Visual Studio 2008 even though you still need to write apps that are 2005 / .Net 2.0 compliant. You might be tempted to go ahead and uninstall 2005. And that would be fine if you are only doing .Net development. But wait…

If you are still doing SQL Server BIDS (Business Intelligence Developer Studio) then don’t uninstall Visual Studio 2005! Currently there’s no support in VS2008 for doing SQL Server 2005 BIDS Development. If you uninstall VS2005 you won’t be able to do any more BIDS work. Trust me, I found out the hard way.

After uninstalling VS2005, I went to do a BIDS project and that’s when I got hit with the nasty surprise. The uninstall had also removed the Dev Environment that was shared with BIDS. I tried to rerun the install of my SQL Server Developer Edition, but for some reason it thought I wanted to upgrade. It kept giving me the message “You cannot upgrade a version of SQL Server from the GUI, you must use the command line.”

I finally had to reinstall VS2005, along with all it’s service packs. After that I was able to work on my BIDS projects again. So take it from me, if you are still doing SQL Server 2005 Business Intelligence projects, Visual Studio 2005 still has some life in it yet.

Alabama Code Camp

The sixth Alabama Code Camp is coming up February 23rd, 2008. Registration is now open, as is the call for speakers. Many, including myself have submitted, you can see them by going to the Alabama Code Camp site and clicking on the speakers link. The list of speakers is very impressive, no less than eight MVPs, and at least two authors. I’m humbled to be amongst such distinguished company!

Here’s the synopsis for my two sessions, in case you are curious:

Introduction to SQL Server Integration Services

Whether you are creating a full blown data warehouse, doing a data conversion from an old system to a new one, or integrating applications together SQL Server Integration Services can help. Get an overview of this powerful tool built into SQL Server.

The Developer Experience

Learn about tips and tricks to enhance your experience as a developer both in the physical world and the virtual world. See hardware that can make your life easier, software additions for Windows and Visual Studio, even how just a few tweaks in the Visual Studio options can make your experience as a developer more pleasant and productive.

This is shaping up to be an impressive code camp, so don’t hesitate and get registered today!

Does MacGyver Dream of Mark Miller?

For Christmas this year my family gave me a copy of MacGyver, Season 1 and 2 on DVD. My wife’s side of the family gave me a gift card which I used to get seasons 3 & 4. I’m a long time MacGyver fan, but my wife had only seen one or two episodes and my kids had never seen it at all, so we’ve been having a lot of fun watching. My favorite part of the series was the voice-overs, where you’d hear MacGyver’s voice as he explained what he was doing. It always started with some odd thought or story that led you through the thought process of how he came to the conclusion to build whatever wacky life saving device he was constructing.

I’ve come to realize in some ways these blog entries are sort of like the MacGyver voice-overs, my inner thoughts being created for you on the web. So I hope you’ll bear with me a few minutes while I relate a rather bizarre dream I had last night.

In my dream I’m standing on stage, in front of a fully loaded computer. It has all the bells and whistles, VS2008, SQL Server, and so on. On the other side of the stage, Mark Miller is there, in front of a similar computer. For those unfamiliar with Miller, he’s the genius behind CodeRush and RefactorPro, tools to help you write code faster. Some time back, when the product was first released Miller used to challenge the audience to beat him in a code writing contest. His machine had CodeRush, and he would use chopsticks to write code, his competitor could use their fingers but did not have CodeRush on their machine. Of course Mark always won.

So sure enough, in my dream there’s Miller, chopsticks in hand ready to go, and I’m the guy going up against him. Our task is to take data from table A1, create a mirror table and name it table A2, and then move all million rows from A1 into A2. As you might guess, in my dream, I win. How?

Well I didn’t write a program. Instead I first jumped into SQL Server Management Studio (SSMS) and used its script generating capability to produce a create table script. Make a quick search and replace and boom I’ve got table A2 created. I then jump over to the Business Intelligence Developer Studio (BIDS) to create a SQL Server Integration Services (SSIS) package to do the data move. (Yes, I probably could have used the script generation of SSMS again to generate an Insert script, but I was showing off.) In about three to four minutes I had accomplished the task and moved all the data while Miller was pecking away at computer with his chopsticks.

I didn’t win because I’m a hot shot coder who is smarter than my competitor. Miller is a (some say mad) genius who can run circles around me in the coding world. As I told the folks in my dream, and I’m telling you now sometimes the best solution to a programming challenge isn’t to program at all! If you read yesterday’s post, Straining at Gnats, you may recall I said “…take some time. Push back from your computer and think for a moment. Think what the true outcome of your application is supposed to be. Not “what will the program do” but “what will the program do for the user???” Think about how best to achieve the user’s goals.

When you are thinking about solutions, take a minute to look outside of your favorite programming language. Is it possible to achieve the goal without writing any code at all? What tool or tools do you have in your tool box that you can combine to get the job done? Here’s a great example that happened to me just before I took off on my holiday vacation.

As I’ve mentioned before at work we have a Business Intelligence (BI) app I work as the lead on, it imports data to a SQL Server 2005 warehouse via SSIS then uses SQL Server Reporting Services (SSRS) to generate reports. The data is imported from a work order management system we bought many years ago. We also have some engineers who have a tiny little Microsoft Access database. This database has a primary key column; we’ll call it a part number for purposes of this example. There are three more columns, some data they need to know for each part but are not found in our big system. They’d like to add this data to the reports our BI app generates. Two last pieces of information, they only update this data once per quarter. Maybe. The last few years they have only done 3 updates a year. Second, the big system I mentioned is due to be replaced sometime in the next two years with a new system that will have their three fields.

A lot of solutions presented themselves to me. Write an ASP.Net app, with a SQL Server back end then use SSIS to move the data. Elegant, but a lot of work, very time consuming for a developer, especially for something that can go away in the near future. Write an SSIS package to pull data from Access? Risky, since we had no control over the Access database. A user could rename columns or move the database all together, in either case trashing the SSIS. Several other automation solutions were considered and rejected, before the final solution presented itself: not to automate at all.

Once per quarter I’ll simply have the engineers send me their Access database. Microsoft Access has a nice upsizing wizard that will move the table to SQL Server, I’ll use that to push the data onto the SQL Server Express that runs on my workstation. I’ll then use the script generating capability of SSMS to make an Insert script for the data. Add a truncate statement to the top to remove the old data and send it to the DBA to run. When I ran through it the first time my total time invested was less than ten minutes. In a worst case scenario I spend 40 minutes a year updating the data so it’s available for reporting. That’s far, far less time that I would have spent on any other solution.

The next time you have a coding challenge, take a moment to “think like MacGyver”. Look at all the tools you have lying around your PC and see what sort of solutions you can come up with. Once you are willing to step outside the comfort zone of your favorite coding language, you may be able to come up with some creative, MacGyver like solutions to your user’s problems.

 

PS – If you missed the announcement while on vacation, DevExpress just released CodeRush / RefactorPro 3.0. More than 150 refactorings and lots of new CodeRush features! Update yours today.

SQL Server Staging Tables – Truncate versus Delete

I’ve been reading a lot of books on SSIS (SQL Server Integration Services) and BI (Business Intelligence) over the course of the year. I want to pass along a little tidbit I haven’t seen in any of them. I’ll preface this by stating our staging tables and data warehouse are all in SQL Server 2005.

Our process is probably similar to others, we pull the data in, and if the warehouse needs to be updated we place the data into a staging table. At the end of the process we do a mass update (via a SQL statement) from the staging table to the main data warehouse tables we use for reporting. Then we delete the records in the staging table. Which seemed like a reasonable thing to do, but wound up getting us in a lot of trouble. Over the course of the last few months our run times for the SSIS job have gotten slower and slower and sloooooooooooooower. Our job was taking as long as 50 minutes to complete sometimes. One of our developers noticed the database seemed to be taking up a lot of space. He found a simple select count(*) was taking eight minutes on what was supposed to be an empty staging table.

Some research on the web explained what we were doing wrong. In one of my favorite SQL Server blogs, I want some Moore, blogger Mladen Prajdic has a great article on the differences between delete and truncate.

http://weblogs.sqlteam.com/mladenp/archive/2007/10/03/SQL-Server-Why-is-TRUNCATE-TABLE-a-DDL-and-not.aspx

The solution then was to not perform a delete, but a truncate on our staging tables. We went ahead and manually issued a truncate on our staging tables, and saw an immediate beneift. Our average run time went from 50 minutes to 8 minutes!

I’m not sure why I haven’t seen this mentioned before, perhaps I just haven’t read the right blog or book yet. But I wanted to pass this along so you could be spared some of the headaches we went through. If your SSIS uses SQL Server 2005 tables, use a truncate and not a delete to avoid speed issues. Alternatively, at least make sure to run truncates on a regular basis to keep those staging areas cleaned out!