Exporting Data from SQL Server to CSV Files for Import to MongoDB Using PowerShell

I’ve been exploring other database systems, in order to determine how to import data from them using SQL Server Integration Services (SSIS). My first step though was to create some test data. I wanted something familiar, so I decided to export the Adventure Works Data Warehouse sample database and import into MongoDB. While I had many options I decided the simplest way was to first export the data to CSV files, then use the MongoDB utility mongimport. Naturally I turned to PowerShell to create an automated, reusable process.

First, if you need the Adventure Works DW database, you’ll find it at http://msftdbprodsamples.codeplex.com/. Second, I did my export from a special version of Adventure Works DW I created called AdventureWorksDW2014. This is optional, but if you want to have a version of Adventure Works DW updated with current dates, see my post at http://arcanecode.com/2013/12/08/updating-adventureworksdw2012-for-2014/. Third, I assume you are familiar with MongoDB, but if you want to learn more go to http://www.mongodb.org/.

Below is the PowerShell 3 script I created. The script is broken into four regions. The first, User Settings, contains the variables that you the user might need to change to get the script to run. It has things like the name of the SQL Server database, the path to MongoDB, etc.

The second region, Common, establishes variables that are used by the remaining two regions. You shouldn’t need to change or alter these. The third region accesses SQL Server and exports each table in the selected database to a CSV format file.

The final region, “Generate MongoDB import commands”, creates a batch (.BAT) file which has all the commands needed to run mongoimport for each CSV file. I decided not to have the PowerShell script execute the .BAT file so it could be reviewed before it is run. There might be some tables you don’t want to import, etc.

It is also quite easy to adapt this script to just create CSV files from SQL Server without using the MongoDB piece. Simply remove the fourth and final region, then in the Common and User Settings regions remove any variables what begin with the prefix “mongo”.

As the comments do a good job of explaining what happens I’ll let you review the included documentation for step by step instructions.

#==================================================================================================
# SQLtoCSVtoMongoDb.ps1
# Robert C. Cain | @ArcaneCode |
http://arcanecode.com
#
# If you need a simple way to export data from SQL Server to MongoDb, here is one way to do it.
# The script starts by setting up some variables to the server environment (see the User Settings
# region)
#
# Next, it exports data from each table in the selected database to individual CSV files.
# Finally, it generates a batch file which executes mongoimport for each csv file to import
# into MongoDb.
#
# I broke this into four regions so if all that is desired is a simple export of data to CSVs,
# you can simply omit the final region along with any variables that begin with "mongo".
#
# While I could have gone ahead and run the batch file at the end, I chose not to in order to
# give you time to review the output prior to running the batch file.
#==================================================================================================

Clear-Host

#region User Settings

  # In this section, set the variables so they are appropriate for your project / environment
 
  # This is the spot where you want to store the generated CSVs.
  # Make sure it does NOT end in a \
  $csvPath = "C:\mongodb"

  # If you are running this on a computer other than the server, set the name of the server below
  $sqlServer = $env:COMPUTERNAME

  # If you have a named instance be sure replace "default" with the name of the instance
  $sqlInstance = "\default"

  # Enter the name of the database to export below
  $sqlDatabaseName = "AdventureWorksDW2014"

  # The settings below only apply to the MongoDB code generation
  # Assemble path to mongodb. This assumes utlities are stored in the default bin folder
  $mongoPath = "C:\mongodb"
  $mongoImport = "$mongoPath\bin\mongoimport"

  # Set the server name and port
  $mongoHost = "localhost"   # Leave blank to default to localhost
  $mongoPort = ""            # Leave blank to default to 27107
 
  # Set the user name and password, leave blank if it isn’t needed
  $mongoUser = ""
  $mongoPW = ""

  # Enter the name of the database to import to.
  $mongoDatabaseName = "AdventureWorksDW2014"

  # Upserts are REALLY slow, especially on large datasets. Setting this to $true will turn off
  # the upsert option. If set to true, you are responsible for either deleting all documents
  # in the collection before hand, or allowing the risk of duplicates.
  #
  # Setting to false will enable the upsert option for mongoimport, and attempt to determine the
  # keys and (if found) add them to the final mongoimport command.
  $mongoNoUpsert = $true

#endregion

#region Common ————————————————————————————
 
  # This section sets variables used by both regions below. There is no need to alter anything
  # in this region.

  # Import the SQLPS provider (if it’s not already loaded)
  if (-not (Get-PSProvider SqlServer))
    { Import-Module SQLPS -DisableNameChecking }

  # Assemble the full servername \ instance
  $sqlServerInstance = "$sqlServer\$sqlInstance"

  # Assemble the full path for the SQL Provider to get to the database
  $sqlDatabaseLocation = "SQLSERVER:\sql\$sqlServerInstance\databases\$sqlDatabaseName"

  # Now tack on the Tables ‘folder’ to the SQL Provider path, the move there
  $sqlTablesLocation = $sqlDatabaseLocation + "\Tables"
  Set-Location $sqlTablesLocation

  # Get a list of tables in this database
  $sqlTables = Get-ChildItem

#endregion

#region Export SQL Data —————————————————————————
  # In this section we will export data from each table in the database to a CSV file.
  # WARNING: If the CSV file exists, it will be overwritten.

  # These are just used to display informational messages during processing
  $sqlTableIterator = 0
  $sqlTableCount = $sqlTables.Count

  # Iterate over each table in the database
  foreach($sqlTable in $sqlTables)
  {
    $sqlTableName = $sqlTable.Schema + "." + $sqlTable.Name   

    # I’ll grant you the next little bit of formatting for the progress messages is a bit
    # OCD on my part, but I like my output formatted and easy to read.
    $sqlTableIterator++
    $padCount = " " * (1 + $sqlTableCount.ToString().Length – $sqlTableIterator.ToString().Length)
    $sqlTableIteratorFormatted = $padCount + $sqlTableIterator

    if( $sqlTableName.Length -gt 50 )
      { $padTable = " " }
    else
      { $padTable = " " * (50 – $sqlTableName.Length) }

    Write-Host -ForegroundColor White -NoNewline "Processing Table $sqlTableIteratorFormatted of $sqlTableCount : $sqlTableName $padTable"
   
    # If the instance is "default", we have to exclude it when we use Invoke-SqlCmd
    if($sqlInstance.ToLower() -eq "\default")
      { $sqlSI = $sqlServer }
    else
      { $sqlSI = $sqlServerInstance }

    # Load an object with all the data in the table
    # Note if you have especially large tables you may need to modify this
    # section to break things into smaller chunks.
    $sqlCmd = "SELECT * FROM " + $sqlTableName
    $sqlData = Invoke-Sqlcmd -Query $sqlCmd `
                             -ServerInstance $sqlSI `
                             -SuppressProviderContextWarning `

    # Now write the data out.
    # Note utf8 encoding is important, as it is all mongoimport understands
    # Also need to omit the Type Info header PowerShell wants to write out
    Write-Host -ForegroundColor Yellow "    Writing to table $sqlTableName.csv"
    $sqlData | Export-Csv -NoTypeInformation -Encoding "utf8" -Path "$csvPath\$sqlTableName.csv"

  }

  # Just add a blank line after the processing ends
  Write-Host

#endregion

#region Generate MongoDB import commands ———————————————————-

  # In this region we will generage the commands to import our newly exported data
  # into an existing database in MongoDB. This is an example of our desired output (wrapped
  # onto multiple lines for readability, in the output it will be a single line):

  #  C:\mongodb>bin\mongoimport –host localhost -port 27107
  #                             –db AdventureWorksDW2014 –collection DimSalesReason
  #                             –username Me –password mySuperSecureP@ssW0rd!
  #                             –type csv –headerline –file DimSalesReason.csv
  #                             –upsert –upsertFields SalesReasonKey

  # Note several of these parameters are optional, and could use defaults, or be potentially
  # omitted from the final output, based on the choices at the very beginning of this script

  # Feel free to alter the $mongoCommand as needed for other circumstances

  # Final warning, the database must already exist in MongoDb in order to import the data. This
  # script will not generate the database for you.

  # Create the name for the batch file we will generate
  $mongoBat = $csvPath + "\Import_SQL_" + $sqlDatabaseName + "_to_MongoDb_" + $mongoDatabaseName + ".bat"

  # See if file exists, if so delete it
  if (Test-Path $mongoBat)
    { Remove-Item $mongoBat }

  # These are just used to display informational messages during processing
  $sqlTableIterator = 0
  $sqlTableCount = $sqlTables.Count

  # mongoimport allows us to do upserts, helping to eliminate duplicate rows on import.
  #
  # To make an upsert work there has to be a key column to match up on. Fortunately,
  # most tables in the SQL Server world have Primary Keys, so we can find out what
  # columns those are and add it to the command. Note if there is no PK in SQL Server,
  # no upsert will be attempted.
  #
  # Note though that upserts are REALLY slow, so the option to skip them is
  # built into the script and set at the top (mongoNoUpsert). The generated batch file
  # assumes that either a) you have deleted all data from the collection ahead of time,
  # or b) you are OK with the risk of duplicate data.

  # Iterate over each table in the database to build the mongoimport command
  foreach($sqlTable in $sqlTables)
  {
    $sqlTableName = $sqlTable.Schema + "." + $sqlTable.Name

    # A bit more OCD progress messages
    $sqlTableIterator++
    $padCount = " " * (1 + $sqlTableCount.ToString().Length – $sqlTableIterator.ToString().Length)
    $sqlTableIteratorFormatted = $padCount + $sqlTableIterator
    Write-Host -ForegroundColor Green "Building mongoimport command for table $sqlTableIteratorFormatted of $sqlTableCount : $sqlTableName"

    # Begin building the command
    $mongoCommand = "$mongoImport "
   
    if ($mongoHost.Length -ne 0)
      { $mongoCommand += "–host $mongoHost " }

    if ($mongoPort.Length -ne 0)
      { $mongoCommand += "–port $mongoPort " }

    $mongoCommand += "–db $mongoDatabaseName –collection $sqlTableName "

    if ($mongoUser.Length -ne 0)
      { $mongoCommand += " –username $mongoUser –password $mongoPW " }

    $mongoCommand += " –type csv –headerline –file $csvPath\$sqlTableName.csv "
       
    # Build the upsert clause, if the user has elected to use it.
    if ($mongoNoUpsert -eq $false)
    {
      $mongoPKs = ""
      foreach($sqlIndex in $sqlTable.Indexes)
      {
        if($sqlIndex.IndexKeyType -eq ‘DriPrimaryKey’)
        {
          foreach($sqlCol in $sqlIndex.IndexedColumns) #$sqlPKColumns)
          {
            if ($mongoPKs.Length -ne 0)
              { $mongoPKs += "," }
            # Note column names are returned with [ ] around them, and must be removed
            # Have to use -replace instead of .Replace() because $sqlCol is an column not a string
            $mongoPKs += ($sqlCol -replace "\[", "") -replace "\]", ""
          }
               
          $mongoCommand += " –upsert –upsertFields $mongoPKs"
        }           
      }
    }

    # Append the command to the batch file
    $mongoCommand | Out-File -FilePath $mongoBat -Encoding utf8 -Append

  }

  # Just add a blank line after the processing ends
  Write-Host

#endregion

 

Installing Ubuntu 8.04 under Microsoft Virtual PC 2007

Update: Nov. 10, 2008 – New blog post on installing Ubuntu 8.10 is now out: http://tinyurl.com/vpcubuntu810

I’m pleased to say that Ubuntu 8.04 is probably the easiest install I’ve had to do with VPC yet! One quick reminder before we begin, when working inside the VPC your mouse will get “trapped” or captured by the virtual computer. You won’t be able to move outside of it. To get it released, just press the RIGHT side ALT key. Left side won’t work, has to be the RIGHT side of your keyboard.

To start with, create a new Virtual PC. For a tutorial, see either my step by step tutorial or the video tutorial if you need more instructions. Since I had the space, I was using 768 meg of ram, and left the disk space at the default of 16 gig. If you can, try and use at least 512 meg of ram for good performance. Use the CD menu option to capture the desktop ISO you downloaded from Ubuntu, or if you have a real CD put it in the drive and capture that. When it launches, you’ll see this screen. (By the way, you can click on any of the screens to see the full size graphic, these have been resized slightly to fit in with most common browser sizes).

[image - Select Language]

Pick your language, I just took the default of English.

[image - Safe graphics mode]

Now press F4 to select an alternate starting mode. When it pops up, change to Safe graphics mode, as you see above, and press Enter. Now pick “Try Ubuntu…” (should already be selected) and press enter. Do NOT pick the Install Ubuntu option, I kept getting VPC errors when trying to install directly.

Additionally, don’t be alarmed if the screen goes black for a while, then you see some garbled graphics. This is perfectly normal, it is just passing through and will be OK when Ubuntu gets done doing it’s thing. It took me about 7 minutes to get from the previous screen to the next one.

[image - live environment]

After it boots you should be in the live session trial environment. Double click the Install icon to begin the install process.

[image - Installer welcome screen]

Screen 1 is just a welcome screen, although you can change your language here if you need to. Press Forward to continue.

[image - Installer Set Time Zone]

Next it wants to know where you are, at least time zone wise. I’m in the central time zone, but set yours appropriately and click Forward.

[image - Installer Pick your Keyboard]

Next you can specify your keyboard. Since I’m using a typical USA style keyboard, I just clicked Forward.

[image - Installer Prepare Disk Space]

Next it asks how you want your disk space partitioned. Since we’re in a virtual environment, it made the most sense to just take the defaults and click Forward.

Be aware, after clicking forward my mouse went into the “I’m busy” mode, and there was a delay while the disks were prepared. Mine went about five minutes. Don’t be alarmed, just wait a few minutes and you’ll then proceed to the next screen.

[image - Installer Who Are You]

On this screen, first supply your name; this will be used in documents and the like. The next text box is the important one – it is for your Ubuntu user name. By default it uses your first name, now is your chance to change it. I rather like mine so will accept it. Next you’ll need to key in a good password and confirm, and finally name the computer. When you are happy, click Forward.

Now is where you may get confused. In the screen above, you are on step 5 of 7. When you click forward, you are suddenly on step 7 of 7. I’m not sure what happened to step 6, I even ran the installer yet one more time just to make sure it was gone. Perhaps it was kidnapped by space aliens?

[image - Installer is Ready]

Apparently even without the missing step 6, the installer has everything it needs. Just click Install to begin the install process. Kick back and wait. Don’t be alarmed if the screen goes black during the process, it’s just the screen saver kicking in. Just click in the VPC and wiggle your mouse and your display will return. I had it kick in several times during the 45 (or so) minutes it took to get everything installed.

[image - Install complete time to reboot]

Eventually Ubuntu will complete it’s install, then give you the above message. On the Virtual PC menu click CD, then release the cd. Then click on the big Restart now button inside VPC.

This was the only real snag I hit in the whole install process, I waited a while and it never did restart on its own. I gave it about five minutes, then in the Virtual PC menu I clicked Action, Reset. I figured since it’s already installed, I wouldn’t lose anything, and I was right.

The boot process does take a few minutes; you’ll see some text then a black screen for about 90 seconds. Then it comes up to the big Ubuntu logo and the orange bar as it loads. You’ll see some garbled graphics for a few seconds, then the login screen finally appeared. I gave it my user id and password, and minutes later I was in Ubuntu.

One last piece of business, fixing the networking. First make sure the network card is mapped to a real network card in your computer. For more instructions on this, see my video, Virtual PC Advanced Settings. After that, click on the network icon in the upper right side of the toolbar, as you see below.

[image - Fix Networking]

Then just pick Wired Network. Once connected you’ll be free to visit your favorite websites!

[image - Ubuntu open for business]

I haven’t had much time to check out other features, or get the sound working so if anyone has a quick fix for that by all means leave a comment below.

The Developer Experience

In case you’re wondering why the slowdown in the blog this week, I’ve been spending all my free time getting ready for Alabama Code Camp 6. My first presentation of the day is “The Developer Experience”. It’s chock full of practical, low cost (or even free!) ways to make your life as a programmer more productive.

As promised in the session, here’s the complete PDF of my slides:  The Developer Experience

Just Code It

Jeff Atwood has an interesting post on his Coding Horror site entitled “Yes, But What Have You *Done*?” ( http://www.codinghorror.com/blog/archives/000809.html ). Programmers, Jeff says tend to be natural introverts, and left to their natural devices will migrate toward head down coding.

“But it is possible to go too far in the other direction, too. It’s much rarer, because it bucks the natural introversion of most software developers, but it does happen. Take me, for example. Sometimes I worry that I spend more time talking about programming than actually programming.”

I know how Jeff feels. In my role as a development lead I spend a lot of time in meetings, or talking to other developers about their projects. As a result I wind up spending a lot of time late at night doing coding, just to keep up with the latest and greatest techniques.

The need for code experience directly resulted in one of my “How To Be A Better Developer…” ( http://arcanecode.wordpress.com/2007/07/13/being-a-better-developer-in-6-months/ ) pledges.

I will work all the code samples in the book. Reading is one thing, but doing is even better. Personally, I find I get a better understanding when I actually type in the code samples and run them. And not just run what’s in the book, but tweak it, experiment with it.

I’m amazed at the number of times I meet some guy who comes across as a self proclaimed expert on a subject. When I quiz the person or try to ask tough questions, it turns out said individual read a book, but never actually wrote any code. Book learning is great, I certainly buy enough books every year to know, but there’s no substitute for doing.

When learning something new, start with the samples. Work it, tweak it, understand it. Then, if appropriate use it on your project at work. If it’s not appropriate, find someone else’s project that it would be a good fit for. Offer to work a few hours unpaid overtime and contribute some code to their project. They’d probably be grateful for the help, and might repay with some pointers and critiques.

Can’t find an appropriate project at work? There are thousands of open source projects out there, find one where you could contribute. Or look around the community; find a charity that needs some programming done.

It’s easier than you think to find some real world places to apply your coding skills. So what are you waiting for, just code it!

Microsoft Goes Open Source

For years critics have been blasting Microsoft over their proprietary standards and applications. Over the last few years, however, Microsoft has slowly been answering those critics by adopting internet standards instead of insisting on their own, and releasing more things to the community.

The ability to save Office 2007 documents as XPS comes to mind, as does the ability for CardSpace to use open standards like OpenID. Now, in their next step they are embracing the open source community through the addition of a new Open Source page within Microsoft.

http://www.microsoft.com/opensource/default.mspx

On this site you can find all sorts of information and resources for those wanting to do open source projects using Microsoft software. Links to articles, websites, and the Visual Studio Express editions can be found. I won’t try to reiterate the entire site here, but if you have an interest in Open Source it’s well worth your time to have a look.

In addition is another site called Port 25. It is the outreach site for Microsoft’s Open Source Software Lab. Some really cool stuff here on Linux interoperability, as well as the new Dynamic Language support such as IronRuby and IronPython.

http://port25.technet.com/

I can tell right now I’m going to be spending a lot of time on Port 25.

Finally, I should mention a site that’s been around for a bit by the name of CodePlex. It’s Microsoft’s site to host open source project done by both Microsoft folks and those of us in the community. (Well, I say us, one day I keep swearing I’ll find time to crank out some cool project and put it on CodePlex.)

http://www.codeplex.com/

Currently they show about 2000 projects right now, so there should be a lot for you to check out.

No, I don’t foresee Vista going open source anytime soon. But I really have to hand it to Microsoft. Somewhere over the last few years they realized they weren’t the only game in town. Since then they have really made an effort to “play nice” with other communities, and embrace many new open standards. The creation of their Microsoft Open Source site is just another step in that journey.

Follow

Get every new post delivered to your Inbox.

Join 93 other followers