Importing MongoDB Data Using SSIS 2012

I have embarked on a little quest to learn other database platforms (especially NoSQL) as more and more of our clients at Pragmatic Works have them in their enterprise, and want to be able to import data from them into their SQL Server data warehouses using SQL Server Integration Services (SSIS). While I found several articles that showed how to do so, these were outdated due to changes in the MongDB C# driver. After quite a bit of effort figuring out how to get this working, I thought I’d pass along my hard fought knowledge.

First, I assume you are familiar with MongoDB (http://www.mongodb.org/) and SQL Server (https://www.microsoft.com/en-us/sqlserver/default.aspx). In my examples I am using SSIS 2012 and MongoDB 2.4.8, along with the C# driver version 1.7 for MongoDB available at http://docs.mongodb.org/ecosystem/drivers/csharp/ .

First, download and install the C# driver. This next step is important, as there was a change that occurred with version 1.5 of the driver: the DLLs are no longer installed in the GAC (Global Assembly Cache) automatically. They must be there, however, for SSIS to be able to use them.

By default, my drivers were installed to C:\Program Files (x86)\MongoDB\CSharpDriver 1.7. You’ll want to open a CMD window in Administrator mode, and navigate to this folder. Next you’ll need GACUTIL, on my computer I found the most recent version at:

C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\

A simple trick to find yours: Since you are already in the CMD window, just move to the C:\Program Files (x86) folder, and do a “dir /s gacutil.exe”. It will list all occurrences of the program, just use the one with the most recent date. Register the dlls by entering these commands:

“C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\gacutil” /i MongDB.Bson.dll

“C:\Program Files (x86)\Microsoft SDKs\Windows\v8.1A\bin\NETFX 4.5.1 Tools\x64\gacutil” /i MongDB.Driver.dll

Note the “ quote marks around the path are important for the CMD window to correctly separate the gacutil program from the parameters.

Once that is done, create a new SQL Server Integration Services project in SQL Server Data Tools (SSDT), what used to be called BIDS in SQL Server 2008R2 (and previous). Put a Data Flow Task on the Control Flow design surface. Then open the Data Flow Task for editing.

Next, drag and drop a Script Component transformation onto the Data Flow design surface. When prompted, change the component type to Source.

image

Now edit the script transform by double clicking on it. Move to the Inputs and Outputs page. For my test, I am using the dbo.DimCurrency collection I created using the technique I documented in the previous post, Exporting Data from SQL Server to CSV Files for Import to MongoDB Using PowerShell ( https://arcanecode.com/2014/01/13/exporting-data-from-sql-server-to-csv-files-for-import-to-mongodb-using-powershell/ )

I renamed the output from “output” to “MongoDB_DimCurrency”. I then added four columns, CurrencyName, CurrencyAlternateKey, CurrencyKey, and ID.

image

Make sure to set CurrencyName, CurrencyAlternateKey, and ID to “Unicode string [DT_WSTR]” Data Type. Then change CurrencyKey to “four byte signed integer [DT_I4]”.

Now return to the Script page and click Edit Script. In the Solution Explorer pane, expand References, right click and pick Add Reference. Go to Browse, and navigate to the folder where the MongoDB C# drivers are installed. On my system it was in C:\Program Files (x86)\MongoDB\CSharpDriver 1.7\. Add both MongoDB.Driver.dll and MongoDB.Bson.dll.

image

Click OK when done, your Solution Explorer should now look something like:

image

Now in the script, expand the Namespaces region and add these lines:

using MongoDB.Bson;
using MongoDB.Driver;
using MongoDB.Bson.Serialization;

Now scroll down to the CreateNewOutputRows() procedure. Here is a sample of the code I used:

public override void CreateNewOutputRows()
{
  string connectionString = "mongodb://localhost";
  string databaseName = "AdventureWorksDW2014";

  var client = new MongoClient(connectionString);
  var server = client.GetServer();
  var database = server.GetDatabase(databaseName);
  string CurrencyKey = "";

  foreach (BsonDocument document in database.GetCollection<BsonDocument>("dbo.DimCurrency").FindAll())
  {
    MongoDBDimCurrencyBuffer.AddRow();
    MongoDBDimCurrencyBuffer.CurrencyName = document["CurrencyName"] == null ? "" : document["CurrencyName"].ToString();
    MongoDBDimCurrencyBuffer.CurrencyAlternateKey = document["CurrencyAlternateKey"] == null ? "" : document["CurrencyAlternateKey"].ToString();
    CurrencyKey = document["CurrencyKey"] == null ? "" : document["CurrencyKey"].ToString();

    MongoDBDimCurrencyBuffer.CurrencyKey = Convert.ToInt32(CurrencyKey);
    MongoDBDimCurrencyBuffer.ID = document["_id"] == null ? "" : document["_id"].ToString();
  }

}

I start by defining a connection string to the MongoDB server, followed by the database name. I then create a MongoClient object. Note the MongoClient is the new way of connecting to the MongoDB server. In earlier versions of the C# driver, you used MongoServer objects.

I then cycle through each document in the collection “dbo.DimCurrency”, using the FindAll() method. For each item I use the AddRow() method to add a row to the buffer. In order to find the proper name for the buffer I went to the Solution Explorer and expanded the BufferWrapper.cs file. This is a class created by the script transform with the name of the output buffer.

image

For each column in my outputs, I map a column from the document. Note the use of the ternary operator ? : to strip out nulls and replace them with empty strings. String columns you can map directly from the document object to the output buffers columns.

The CurrencyKey column, being an integer, had to be converted from a string to an integer. To make it simple I created a string variable to hold the return value from the document, then used the Convert class to convert it to an INT 32.

Once you’ve done all the above, validate the code by building the code. If that all checks out save your work, close the code window, then close the Script Transformation Editor by clicking OK.

Now place a destination of some kind on the Data Flow. Since I have my company’s Task Factory tools I used a TF Terminator Destination, but you could also use a Row Count destination. On the precedence constraint between the two, right click and Enable Data Viewer. Execute the package, if all goes well you should see:

image

A few final notes. This test was done using a MongoDB document schema that was flat, i.e. it didn’t have any documents embedded in the documents I was testing with. (Hopefully I’ll be able to test that in the future, but it will be the subject of a future post.) Second, the key was the registering of the DLLs in the GAC. Until I did that, I couldn’t get the package to execute. Finally, by using the newer API for the MongoDB objects I’ve ensured compatibility for the future.

Advertisements

One thought on “Importing MongoDB Data Using SSIS 2012”

  1. This frankly seems quite painful to me. Are there any plans to add MongoDB as a data source to the PragmaticWorks TaskFactory package?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s