SQL Saturday Atlanta #285–Everything You Ever Knew About SSIS Is Null and Void

For the SQL Saturday #285 in Atlanta, May 3rd 2014, I am presenting “Everything You Ever Knew About SSIS Is Null and Void”. You’ll find the presentation itself at:

http://sqlsaturday.com/viewsession.aspx?sat=285&sessionid=19929

All of the SSIS samples including the simple SSIS packages, SQL Scripts, and Slides will be available there shortly.

The main pieces of code, the SQL scripts to  monitor SSIS, can also be found in my TechNet Gallery:

http://gallery.technet.microsoft.com/SQL-to-Monitor-and-Explore-e485fa84

Thanks for attending!

Importing MongoDB Data Using SSIS 2012

I have embarked on a little quest to learn other database platforms (especially NoSQL) as more and more of our clients at Pragmatic Works have them in their enterprise, and want to be able to import data from them into their SQL Server data warehouses using SQL Server Integration Services (SSIS). While I found several articles that showed how to do so, these were outdated due to changes in the MongDB C# driver. After quite a bit of effort figuring out how to get this working, I thought I’d pass along my hard fought knowledge.

First, I assume you are familiar with MongoDB (http://www.mongodb.org/) and SQL Server (https://www.microsoft.com/en-us/sqlserver/default.aspx). In my examples I am using SSIS 2012 and MongoDB 2.4.8, along with the C# driver version 1.7 for MongoDB available at http://docs.mongodb.org/ecosystem/drivers/csharp/ .

First, download and install the C# driver. This next step is important, as there was a change that occurred with version 1.5 of the driver: the DLLs are no longer installed in the GAC (Global Assembly Cache) automatically. They must be there, however, for SSIS to be able to use them.

By default, my drivers were installed to C:\Program Files (x86)\MongoDB\CSharpDriver 1.7. You’ll want to open a CMD window in Administrator mode, and navigate to this folder. Next you’ll need GACUTIL, on my computer I found the most recent version at:

C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\

A simple trick to find yours: Since you are already in the CMD window, just move to the C:\Program Files (x86) folder, and do a “dir /s gacutil.exe”. It will list all occurrences of the program, just use the one with the most recent date. Register the dlls by entering these commands:

“C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\gacutil” /i MongDB.Bson.dll

“C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\gacutil” /i MongDB.Driver.dll

Note the “ quote marks around the path are important for the CMD window to correctly separate the gacutil program from the parameters.

Once that is done, create a new SQL Server Integration Services project in SQL Server Data Tools (SSDT), what used to be called BIDS in SQL Server 2008R2 (and previous). Put a Data Flow Task on the Control Flow design surface. Then open the Data Flow Task for editing.

Next, drag and drop a Script Component transformation onto the Data Flow design surface. When prompted, change the component type to Source.

image

Now edit the script transform by double clicking on it. Move to the Inputs and Outputs page. For my test, I am using the dbo.DimCurrency collection I created using the technique I documented in the previous post, Exporting Data from SQL Server to CSV Files for Import to MongoDB Using PowerShell ( http://arcanecode.com/2014/01/13/exporting-data-from-sql-server-to-csv-files-for-import-to-mongodb-using-powershell/ )

I renamed the output from “output” to “MongoDB_DimCurrency”. I then added four columns, CurrencyName, CurrencyAlternateKey, CurrencyKey, and ID.

image

Make sure to set CurrencyName, CurrencyAlternateKey, and ID to “Unicode string [DT_WSTR]” Data Type. Then change CurrencyKey to “four byte signed integer [DT_I4]”.

Now return to the Script page and click Edit Script. In the Solution Explorer pane, expand References, right click and pick Add Reference. Go to Browse, and navigate to the folder where the MongoDB C# drivers are installed. On my system it was in C:\Program Files (x86)\MongoDB\CSharpDriver 1.7\. Add both MongoDB.Driver.dll and MongoDB.Bson.dll.

image

Click OK when done, your Solution Explorer should now look something like:

image

Now in the script, expand the Namespaces region and add these lines:

using MongoDB.Bson;
using MongoDB.Driver;
using MongoDB.Bson.Serialization;

Now scroll down to the CreateNewOutputRows() procedure. Here is a sample of the code I used:

public override void CreateNewOutputRows()
{
  string connectionString = "mongodb://localhost";
  string databaseName = "AdventureWorksDW2014";

  var client = new MongoClient(connectionString);
  var server = client.GetServer();
  var database = server.GetDatabase(databaseName);
  string CurrencyKey = "";

  foreach (BsonDocument document in database.GetCollection<BsonDocument>("dbo.DimCurrency").FindAll())
  {
    MongoDBDimCurrencyBuffer.AddRow();
    MongoDBDimCurrencyBuffer.CurrencyName = document["CurrencyName"] == null ? "" : document["CurrencyName"].ToString();
    MongoDBDimCurrencyBuffer.CurrencyAlternateKey = document["CurrencyAlternateKey"] == null ? "" : document["CurrencyAlternateKey"].ToString();
    CurrencyKey = document["CurrencyKey"] == null ? "" : document["CurrencyKey"].ToString();

    MongoDBDimCurrencyBuffer.CurrencyKey = Convert.ToInt32(CurrencyKey);
    MongoDBDimCurrencyBuffer.ID = document["_id"] == null ? "" : document["_id"].ToString();
  }

}

I start by defining a connection string to the MongoDB server, followed by the database name. I then create a MongoClient object. Note the MongoClient is the new way of connecting to the MongoDB server. In earlier versions of the C# driver, you used MongoServer objects.

I then cycle through each document in the collection “dbo.DimCurrency”, using the FindAll() method. For each item I use the AddRow() method to add a row to the buffer. In order to find the proper name for the buffer I went to the Solution Explorer and expanded the BufferWrapper.cs file. This is a class created by the script transform with the name of the output buffer.

image

For each column in my outputs, I map a column from the document. Note the use of the ternary operator ? : to strip out nulls and replace them with empty strings. String columns you can map directly from the document object to the output buffers columns.

The CurrencyKey column, being an integer, had to be converted from a string to an integer. To make it simple I created a string variable to hold the return value from the document, then used the Convert class to convert it to an INT 32.

Once you’ve done all the above, validate the code by building the code. If that all checks out save your work, close the code window, then close the Script Transformation Editor by clicking OK.

Now place a destination of some kind on the Data Flow. Since I have my company’s Task Factory tools I used a TF Terminator Destination, but you could also use a Row Count destination. On the precedence constraint between the two, right click and Enable Data Viewer. Execute the package, if all goes well you should see:

image

A few final notes. This test was done using a MongoDB document schema that was flat, i.e. it didn’t have any documents embedded in the documents I was testing with. (Hopefully I’ll be able to test that in the future, but it will be the subject of a future post.) Second, the key was the registering of the DLLs in the GAC. Until I did that, I couldn’t get the package to execute. Finally, by using the newer API for the MongoDB objects I’ve ensured compatibility for the future.

SSIS tip for Lookup Transformations

The Lookup Transformation is a cornerstone of almost any SSIS package. The vast majority of packages that load Fact tables use Lookups extensively. Many of these lookups reference the same tables over and over.

As we all (hopefully) know by now, opting to lookup against a table is akin to doing a SELECT *. Best practices guide you to only select the columns you really need for a lookup, typically the surrogate key and business key. Because I often reference the same tables over and over, I’ve taken to keeping all my frequently referenced tables in a single SQL file.

I typically store all my projects in a “Projects” folder off my root directory (aka C:\). Under it I create a folder for each project I work on. Within there I create a folder simply called SQL, where I store some SQL scripts. Some are just temporary as I work through issues, or as in this case a good place to store my commonly used lookups. It will wind up looking something like:

-- Employee SELECT DimEmployeeId, EmployeeBusinessKey FROM DimEmployee -- Company SELECT DimCompanyId, CompanyBusinessKey FROM DimCompany -- Office SELECT DimOfficeId, OfficeBusinessKey FROM DimOffice -- More here

That’s a very generic example, but you get the idea. Simple, but very handy.

Creating a Data Warehouse Date Id in Task Factory Advanced Derived Column Transformation

The company I work for, Pragmatic Works, makes a great tool called Task Factory. It’s a set of transformations that plug into SQL Server Integration Services and provides a wealth of new controls you can use in your packages. One of these is the Advanced Derived Column Transformation. If you are familiar with the regular Derived Column transformation built into SSIS, you know that it can be painful to use if you have to create anything other than a very basic calculation. Every try typing something complex into that single row tiny little box? Egad.

The Task Factory Advanced Derived Column transform allows you to pop up a dialog and have true multi-line editing. In addition there are 180 addition functions to make your life easier. Which is actually the point of this whole post.

As a Business Intelligence developer, one of the things I have to do almost daily is convert a date data type to an integer. Most dates (at least in the US) are in Month / Day / Year format. Overseas the format is usually Day / Month / Year (which to me makes more sense). SQL Server Analysis Services loves integer based field, so a common practice is to store dates as an integer in YYYYMMDD format.

Converting a date to an integer using the derived column transform can be ugly. Here’s an example of a fairly common (although not the only) way to do it:


(DT_I4)((DT_WSTR,4)YEAR(MyDateColumn) + RIGHT("00" + (DT_WSTR,2)MONTH(MyDateColumn),2) + RIGHT("00" + (DT_WSTR,2)DAY(MyDateColumn),2))

Task Factory makes this much easier. There is a ToChar function which converts columns or values to characters. This function allows you to pass in a format to convert to. Wrap all that in a ToInteger function and away you go. Check this out:


ToInteger(ToChar(MyDateColumn, "yyyyMMdd"))

Much, much simpler. One thing, the case of the format is very important. It must be yyyyMMdd, otherwise it won’t work. If you want to extend this more, you can actually check for a null, and if it is null return a –1 (a common Id for a missing row) or another special integer to indicate a missing value, such as 19000101.


IIf(IsNull(MyDateColumn)
   , -1
   , ToInteger(ToChar(MyDateColumn, "yyyyMMdd"))
   )

Here we first check to see if the column is null, if so we return the missing value, else we return the date converted integer. And yes, you can do multi line code inside the Advanced Derived Column Transformation.

As you can see the Advanced Derived Column Transformation makes working with dates much, much easier than the standard derived column transformation. This is such a common need that, at the risk of sounding like an ad, I decided to blog about it so I can share this with all my clients in the future.

(Just to be clear, it’s not an ad, I was not asked to do this, nor did I receive any money for it. Mostly I did this post just so I could share the syntax when I start each project or training class.)

Using TFS2010 with Visual Studio / BIDS 2008 and SQL Server Management Studio

When I come to a customer site, I often have to help them get setup with TFS (Team Foundation Server) 2010, Microsoft’s source code control / ALM (application lifecycle management) system. This is so they can work with their BIDS (Business Intelligence Developer Studio) projects as a team, giving the added benefit of source code control. I’ve had to do this often enough I wanted to record the steps for my own use, and hopefully others too.

Installing the TFS 2010 tools for Visual Studio / BIDS 2008

First off, thanks to Derek Miller for covering most of the steps involved in his blog post http://derekjmiller62.wordpress.com/2010/10/19/using-tfs-2010-with-bids-2008/. I won’t go into the detail he did, but will summarize into these basic steps.

1. If you haven’t installed Visual Studio 2008 Service Pack 1, do so by downloading it and installing.

2. Next, you will need to install the Visual Studio 2008 Team Explorer.

3. After installing Team Explorer, you will have to go back and reinstall VS SP1 (from step 1). Don’t skip this step! Team explorer has some older components that overwrite the SP1 components, and you will have reinstall them.

Now this next part I really haven’t seen anywhere else and was a real pain to find, and thus is the main reason for this post. During the SP1 install, we often see “Visual Studio SP1 Installation Failed”. Checking the error log, buried deep you will find “Returning IDOK. INSTALLMESSAGE_ERROR [Error 2902. An internal error has occurred. …”

When you see this, go to your Control Panel, and then to Add Remove Programs. Look for a program called “Microsoft Visual Studio Web Authoring Component” and uninstall it. This is actually installed as part of the Office suite, and you don’t really need it since you likely have much more powerful web authoring tools, or since you are doing BI development won’t be doing an web development in Microsoft Office.

After uninstalling it, SP1 should then install, and you are ready for step 4.

4. Install the Visual Studio Team System 2008 SP1 Forward Compatibility Update for Team Foundation Server 2010. That probably took you longer to read than it actually will to install. After installing, it may prompt you to reboot. Even if it doesn’t ask you should reboot anyway, we’ve seen a few times when we weren’t able to connect until we rebooted.

After that you should be able to go into Visual Studio and go to Tools, Connect to Team Foundation Server. If you still have problems connecting, I will refer you to Derek’s post where he describes some registry entries you can try. So far we haven’t found them necessary, but you may.

Installing the TFS 2010 Tools

Note that there is one big limitation to using TFS 2010 with VS2008. You can connect to a TFS site and upload your solutions and projects, but you can’t create a new team site with VS2008. To do so, you will need the VS2010 shell with the TFS components, a free download.

Installing TFS 2010 for SQL Server Management Studio (SSMS)

Now that you have BIDS all setup to work with TFS, it only makes sense to make your SQL Server Management Studio (SSMS) also work with TFS. Joseph Jun has a great blog post that goes into all the nitty gritty of how to do this. The short version though, is after you install the TFS 2010 tools in the step above (and they are a prerequisite) you need to install the Team Foundation Server MSSCCI Provider 2010.

After the install, you should see a new Source Control menu option under the File menu in SSMS. From here you can launch the TFS 2010 management shell or open an existing SSMS project / solution. If you have a solution you need to add, simply right click on the solution in the Solution Explorer window and pick Add to Source Control.

Visual Studio Database Projects

Note that if you are using Visual Studio Database Projects, any SQL Server 2008R2 development must be done in Visual Studio 2010. VS2010 is already setup to talk to TFS 2010. If you are using VS 2008 database projects to build a SQL Server 2008 (non-R2) database, then with the steps above you should be good to go for checking in your database project into TFS.

And away we go!

And with that you should be setup to manage your BI Development in Team Foundation Server 2010. It’s a lot of work, but well worth the effort. Using TFS will let your BI staff work as a team to develop projects. Additionally you have the benefit of source code control, something invaluable in the case of package corruptions or needing to track history.

SSIS Training Resources

I’ve been asked to provide links to some useful resources for learning about SQL Server Integration Services. Below are a list of my favorite blogs, books, and other sites to learn from.

A quick disclaimer, some of the links below are by co-workers or other people I have an affiliation with, financial or otherwise. That’s because I’m lucky enough to work with some of the best people in the field. Also, in the case of the books I’ve linked to the Kindle version where possible, mostly because I’m a Kindle junkie. There are paper versions of the books, and you are free to buy from your favorite retailer.

 

Books

Microsoft SQL Server 2008 Integration Services: Problem, Design, Solution – I’m a big fan of problem / design / solution books. They do a great job of teaching you how to solve real world issues. The chapter by Jessica Moss on package configurations is especially useful, one I refer back to often.

Professional Microsoft SQL Server 2008 Integration Services – This is another of those big thick books (1000+ pages if you decide to lug around the paper version) that covers the gambit of everything you ever wanted to know about SSIS but were afraid to ask.

Knight’s 24-hour Trainer: SQL Server 2008 Integration Services – This is one time where I suggest getting the physical book, as it is accompanied by a DVD full of videos focused on SSIS. They will take you from an absolute beginner and walk you through each of the most used Tasks and Transformations used in SSIS. If you want to get up to speed fast on SSIS this is the book to get.

 

Blogs

Jamie Thomson – Better known as the SSIS Junkie, Jamie posts some really advanced topics on his blog.

Andy Leonard – Andy is a whiz at SSIS, and does a great job at both teaching and consulting.

Christopher Price – A very active blogger on SSIS.

 

Videos

Pragmatic Works Webinars – On our website we have a big catalog of past webinars (all of which are free to watch), many of which focus on SSIS. 

Pluralsight – Pluralsight has an extensive catalog of courses, including an SSIS course. It’s subscription bases so there is a modest fee (starts at $29 US per month last I checked) but well worth it for the training you can get. There’s also a free trial.

SQL Share – This site takes a new twist on videos, in that each video is very short and very focused on one specific task. Lots of good stuff on here on SSIS, much of it by Brian Knight. Free.

 

For a quick link to this post, you can use http://bit.ly/arcanessis.

The Great Pragmatic Works Task Factory Round Up

Over the last month or so, I’ve been blogging about my companies (Pragmatic Works) cool suite of SSIS (SQL Server Integration Services) add in tools called Task Factory. I did this mostly to learn about the various components, and decided to video the whole thing in order to a) help myself remember and b) share the knowledge with everyone.

I’m about to move onto some of our other tools, and as I explore and learn them will be adding videos as I go. I wanted to wrap up the Task Factory series with a summary of all the tools, and links to the blog post and the direct link to the video. Please note I’ve listed these alphabetically, to make them easy to find, instead of the order in which they were published. 

Task Factory Address Parse video | blog

Task Factory Advanced E-Mail and SMS Task video | blog

Task Factory Case Transform video | blog

Task Factory Compression video | blog

Task Factory Data Cleansing video | blog

Task Factory Data Validation video | blog

Task Factory Delete Batch Transform video | blog

Task Factory Dimension Merge Slowly Changing Dimension video | blog

Task Factory Email Source video | blog

Task Factory File Properties video | blog

Task Factory Null Handler blog

Task Factory Replace Unwanted Characters video | blog

Task Factory RegEx video | blog

Task Factory SalesForce.com Source and Destination blog

Task Factory SharePoint Destination video | blog

Task Factory SharePoint Source video | blog

Task Factory Surrogate Key transform video | blog

Task Factory Terminator Destination blog

Task Factory Trim Plus video | blog

Task Factory Update Batch transform video | blog

Task Factory Upsert Destination video | blog

Task Factory XML Destination video | blog

Follow

Get every new post delivered to your Inbox.

Join 106 other followers