Introducing Microsoft PowerPivot

What is PowerPivot? Well according to Microsoft:

“PowerPivot is Microsoft Self-Service Business Intelligence”

I can see from the glazed looks you are giving your monitor that was clear as mud. So let’s step back a bit and first define what exactly is Business Intelligence.

Business Intelligence

Business Intelligence, often referred to as simply “BI”, is all about taking data you already have and making sense of it. Being able to take that information and turn it from a raw jumble of individual facts and transform it into knowledge that you can take informed actions on.

In every organization there is already someone who is doing BI, although they may not realize it. Microsoft (and many IT departments) refer to this person as “that guy”. A power user, who grabs data from anyplace he (or she) can get it, then uses tools like Excel or Access to slice it, dice it, and analyze it. This person might be an actual Business Analyst, but more often it’s someone for who BI is not their main job. Some common examples of people doing their own BI today are production managers, accountants, engineers, or sales managers, all who need information to better do their job. Let’s look at an illustration that will make it a bit clearer.

In this example, put yourself in the role of a sales manager. You have gotten IT to extract all of your sales orders for the last several years into an Excel spreadsheet. In order to determine how well your sales people are doing, you need to measure their performance. You’ve decided that the amount sold will be a good measure, and use Excel to give you totals.

IntroEx01

In BI terms, the column “Total Sales” is known as a measure, or sometimes a fact, as it measures something, in this case the sales amount. The grand total sales amount is often called an aggregation, as it totals up the individual rows of data that IT gave us. But now you might be wondering why Andy’s sales are so low? Well, now you want to dig deeper and look at sales by year.

IntroEx02

In BI terms, the names of the sales people are a dimension. Dimensions are often either a “who” (who sold stuff) or a “what” (what stuff did we sell). Places (where was it sold) and dates (when was it sold) are also common dimensions. In this case the sales dates across the top (2007, 2008, 2009) are a date dimension. When we use two or more dimensions to look at our measures, we have a pivot table.

Now we can see a picture emerging. It’s obvious that Andy must have been hired as a new salesperson in late 2008, since he shows no sales for 2007 and very small amount in 2008. But for Paul and Kimberly we can look at something called trends in the BI world. Kimberly shows a nice even trend, rising slowly over the last three years and earns a gold star as our top performer.

By being able to drill down into our data, we spot another trend that was not readily obvious when just looking at the grand totals. Paul has been trending downward so fast the speed of light looks slow. Clearly then we now have information to take action on, commonly known as actionable intelligence.

So remind me, why do we need PowerPivot?

As you can see in the above example, “that guy” in your company clearly has a need to look at this data in order to do his job. Not only does he need to review it, he also has the issue of how to share this information with his co-workers. Unfortunately in the past the tools available to “that guy” have had some drawbacks. The two main tools used by our analyst have been either Excel, or a complete BI solution involving a data warehouse and SQL Server Analysis Services.

Excel’s main limitations center around the volume of data needed to do good analysis. Excel has limits to the number of rows it can store, and for large datasets a spreadsheet can consume equally large amounts of disk space. This makes the spreadsheet difficult to share with coworkers. In addition mathematical functions like aggregations could be slow. On the good side, Excel is readily available to most workers, and a solution can be put together fairly quickly.

A full blown BI solution has some major benefits over the Excel solution. A data warehouse is created, and then SQL Server Analysis Services (often abbreviated as SSAS) is used to precalculate aggregations for every possible way an analyst might wish to look at them. The data is then very easy to share via tools like Excel and SQL Server Reporting Services. While very robust and powerful solution, it does have some drawbacks. It can take quite a bit of time to design, code, and implement both the data warehouse and the analysis services pieces of the solution. In addition it can also be expensive for IT to implement such a system.

Faster than a speeding bullet, more powerful than a locomotive, it’s PowerPivot!

PowerPivot combines the best of both worlds. In fact, it’s not one tool but two: PowerPivot for Microsoft Excel 2010, and PowerPivot for SharePoint 2010. What’s the difference you ask? Good question.

PowerPivot for Microsoft Excel 2010

PowerPivot acts as an Add-on for Excel 2010, and in many ways is quite revolutionary. First, it brings the full power of SQL Server Analysis Services right into Excel. All of the speed and power of SSAS is available right on your desktop. Second, it uses a compression technology that allows vast amounts of data to be saved in a minimal amount of space. Millions of rows of data can now be stored, sorted, and aggregated in a reasonable amount of disk space with great speed.

PowerPivot can draw its data from a wide variety of sources. As you might expect, it can pull from almost any database. Additionally it can draw data from news feeds, SQL Server Reporting Services, other Excel sheets, it can even be typed in manually if need be.

Another issue that often faces the business analyst is the freshness of the data. The information is only as good as the date it was last imported into Excel. Traditionally “that guy” only got extracts of the database as IT had time, since it was often a time consuming process. PowerPivot addresses this through its linked tables feature. PowerPivot will remember where your data came from, and with one simple button click can refresh the spreadsheet with the latest information.

Because PowerPivot sits inside Microsoft Excel, it not only can create basic pivot tables but has all the full featured functionality of Excel at its disposal. It can format pivot tables in a wide array of styles, create pivot charts and graphs, and combine these together into useful dashboards. Additionally PowerPivot has a rich set of mathematical functionally, combining the existing functions already in Excel with an additional set of functions called Data Analysis eXpressions or DAX.

PowerPivot for SharePoint 2010

PowerPivot for Excel 2010 clearly solves several issues around the issue of analysis. It allows users to quickly create spreadsheets, pivot tables, charts, and more in a compact amount of space. If you recall though, creation was only half of “that guys” problem. The other half was sharing his analysis with the rest of his organization. That’s where PowerPivot for SharePoint 2010 comes into play.

Placing a PowerPivot Excel workbook in SharePoint 2010 not only enables traditional file sharing, but also activates several additional features. First, the spreadsheet is hosted right in the web browser. Thus users who might not have made the transition to Excel 2010 can still use the PowerPivot created workbook, slicing and filtering the data to get the information they require.

Data can also be refreshed on an automated, scheduled basis. This ensures the data is always up to date when doing analysis. Dashboards can also be created from the contents of a worksheet and displayed in SharePoint. Finally these PowerPivot created worksheets can be used as data sources for such tools as SQL Server Reporting Services.

Limitations

First, let me preface this by saying as of this writing all of the components are either in CTP (Community Technology Preview, a pre-beta) or Beta state. Thus there could be some changes between now and their final release next year.

To use the PowerPivot for Excel 2010 components, all you have to have is Excel 2010 and the PowerPivot add-in. If you want to share the workbook and get all the rich functionality SharePoint has to offer, you’ll have to have SharePoint 2010, running Excel Services and PowerPivot 2010 Services. You’ll also have to have SQL Server 2008 R2 Analysis Services running on the SharePoint 2010 box. Since you’ll have to have a SQL Server instance installed to support SharePoint this is not a huge limitation, especially since SSAS comes with SQL Server at no extra cost.

One thing I wish to make clear, SharePoint 2010 itself can run using any version of SQL Server from SQL Server 2005 on. It is the PowerPivot service that requires 2008 R2 Analysis Services.

One other important item to note: at some point the load upon the SharePoint 2010 server may grow too large if especially complex analysis is being done. Fortunately SharePoint 2010 ships with several tools that allow administrators to monitor the load and plan accordingly. At the point where the load is too big, it is a clear indication it’s time to transition from a PowerPivot solution to a full BI solution using a data warehouse and SQL Server Analysis Services.

What does PowerPivot mean for business users?

For business users, and especially “that guy”, it means complex analysis tools can be created in a short amount of time. Rich functionality makes it easier to spot trends and produce meaningful charts and graphs. It also means this information can be shared with others in the organization easily, without imposing large burdens on the corporate e-mail system or local file sharing mechanisms.

No longer will users be dependent on IT for their analysis, they will have the power to create everything they need on their own, truly bringing “self service BI” to fruition.

What does PowerPivot mean for Business Intelligence IT Pros?

The first reaction many BI developers have when hearing about PowerPivot is “oh no, this is going to put me out of a job!” Far from it, I firmly believe PowerPivot will create even more work for BI Professionals like myself.

As upper management grows to rely on the information provided by PowerPivot, they will also begin to understand the true value BI can bring to an organization. Selling a new BI solution into an organization where none currently exists can be difficult, as it can be hard to visualize how such a solution would work and the value it brings. PowerPivot allows BI functionality to be brought into an organization at a low development cost, proving the value of BI with minimal investment. Thus when there is a need to implement a larger, traditional BI project those same managers will be more forthcoming with the dollars.

Second, as users pull more and more data, they are going to want that data better organized than they will find in their current transactional business systems. This will in turn spur the need to create many new data warehouses. Likewise the IT department will also want data warehouses created, to reduce the load placed on those same transactional business systems.

I also foresee PowerPivot being used by BI Pros themselves to create solutions. The database structure of many transactional database systems can be difficult to understand even for experienced IT people, much less users. BI Pros can use PowerPivot to add a layer of abstraction between the database and the users, allowing business analysts to do their job without having to learn the complexity of a database system.

BI Pros can also use PowerPivot to implement quick turnaround solutions for customers, bringing more value for the customer’s dollar. When a BI Pro can prove him (or her) self by providing rich functionality in a short time frame it’s almost always the case they are brought back in for multiple engagements.

PowerPivot also provides great value to BI Pros who are employed full time in an enterprise organization. They can create solutions much quicker than before, freeing them up to do other valuable tasks. In addition PowerPivot solutions can provide a “stop gap” solution, pushing the date at which the organization needs to spend the dollars for a full blown BI solution and allowing IT to plan better.

Finally I see great value in PowerPivot as a prototyping tool for larger BI projects. Now users can see their data, interact with it, analyze it, and ensure the required measures and dimensions are present before proceeding with the larger project.

I’ll reiterate, if anything I believe PowerPivot will create an explosion of work for the Business Intelligence Professional.

Where can I learn more?

Well right here for one. I have become quite interested in PowerPivot since seeing it at the SQL PASS 2009 Summit. I think it will be a valuable tool for both myself and my customers. This will be the first of many blog posts to come on PowerPivot. I am also beginning a series of presentations on PowerPivot for local user groups and code camp events. The first will be Saturday, November 21st 2009 at the SharePoint Saturday in Birmingham Alabama, but there will be many more to come. (If you’d like me to come speak at your group just shoot me an e-mail and we’ll see what we can arrange.)

There’s also the PowerPivot site itself:

I’ve also found a small handful of blogs on PowerPivot, listed in no particular order:

Summary

Thanks for sticking with me, I know this was a rather long blog post but PowerPivot has a lot of rich functionality to offer. While PowerPivot is still in the CTP/Beta stage as of this writing, I see more and more interest in the community, which will continue to grow as PowerPivot moves closer to release. I hope this post has set you off on the right step and you’ll continue to come back for more information.

Full Text Searching a FILESTREAM VARBINARY (MAX) Column

In the past I’ve written that Full Text Searching has the ability to index documents stored in a VARBINARY(MAX) field. However, I have never really gone into any details on how to do this. Today I will remedy that by demonstrating how to Full Text Seach not only using a VARBINARY(MAX) field, but one that has been stored using FILESTREAM. Even though these examples will be done against the data we’ve stored with FILESTREAM over the lessons from the last few days, know that this technique is identical for binary objects stored in a VARBINARY(MAX) field without using FILESTREAM.

Let’s start by creating a catalog to hold our Full Text data.


CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT;

Pretty normal, now we need to create a full text index on the “DocumentRepository” table we created in this series. When you look at the syntax though, you may notice a minor difference from the CREATE FULLTEXT INDEX examples I’ve shown in the past:


CREATE FULLTEXT INDEX ON dbo.DocumentRepository
(DocumentName, Document TYPE COLUMN DocumentExtension)
KEY INDEX PK__Document__3214EC277F60ED59
ON FileStreamFTSCatalog
WITH CHANGE_TRACKING AUTO;

Here you can see I am indexing two fields. The first is the “DocumentName”, which is passed in as the first parameter and looks like other examples. We won’t actually be using it in this example, however I included it to demonstrate you can index multiple columns even when one of them is a VARBINARY(MAX) column.

The second parameter indexes the VARBINARY(MAX) “Document” column itself, but notice the TYPE COLUMN after the column name. In order to Full Text Index a VARBINARY(MAX) column you must also have a column with the file extension in it. You then pass in the name of column after the TYPE COLUMN. In this example, the document extension is stored in the “DocumentExtension” column. Since the document extension can be stored in a column with any name, we let the Full Text engine know which column by passing it in after the TYPE COLUMN keyword. The remainder of the command is like other examples I’ve shown in the past.

Now we can run a normal SELECT…CONTAINS query against the “Document” field.


SELECT ID, DocumentName
FROM dbo.DocumentRepository
WHERE CONTAINS(Document, 'Shrew');

I’ll leave it to you to run, for me it returned one row, with “TheTamingOfTheShrew.doc”. If you want to try it again, use “Elinor”, and you should get back “KingJohn.doc”.

As you can see, performing a Full Text Search against a VARBINARY(MAX) column is quite easy, all you have to do is indicate the document type by using the TYPE COLUMN. There are two more things you should know. First, the column containing the document extension must be of type CHAR, NCHAR, VARCHAR, or NVARCHAR. Second, the document type must be recognized by SQL Server. To get a list of all valid document types, simply query the fulltext_document_types catalog view like so:


SELECT * FROM sys.fulltext_document_types;

This will give you a list of all file extensions understood by SQL Server. Each row actually represents a filter. Each filter represents a DLL that implements the IFilter interface. It is possible to add additional filters to the system. For example, Microsoft offers the “Microsoft Filter Pack”. You may have noticed that out of the box SQL Server 2008 supports the older Office 2003 documents, but not the more recent Office 2007 formats. To add these newer formats to your SQL Server, Microsoft provides the afore mentioned filter pack. While installing it is beyond the scope of this aritcle you can find complete instructions for downloand and installation at http://support.microsoft.com/default.aspx?scid=kb;en-us;945934 .

The Full Text Search features provided by SQL Server continue to amaze me with how powerful they are, yet how easy they are to implment. With the information here you can easily search through documents stored in a VARBINARY(MAX) field, even when those documents are actually stored via the new SQL Server 2008 FILESTREAM.

Accessing FILESTREAM Data From A Client .NET Application – Part 2 Downloading a File

In the previous entry we covered how to upload a file to SQL Server using the FILESTREAM, new to SQL Server 2008. In this post we will look at retrieving a file from SQL Server using FILESTREAM. If you missed yesterday’s installment, a simple front end was created, the full project can be found at the Code Gallery site http://code.msdn.microsoft.com/FileStreamFTS .

The interface is very simple:

image

The grid is a Data View Grid that shows the ID and Document information from the table we previously created. (If you want to see the code to populate the grid see the project.) The user picks a row, then clicks on the Get File button.

    private void btnGetFile_Click(object sender, EventArgs e)

    {

      // Reset in case it was used previously

      lblStatus.Text = "";

 

      if (dgvFiles.CurrentRow != null)

      {

        // Grab the ID (Primary Key) for the current row

        int ID = (int)dgvFiles.CurrentRow.Cells[0].Value;

        // Now Save the file to the folder passed in the second

        // paramter.

        FileTransfer.GetFile2(ID, @"D:\Docs\Output\");

        // And let user know it’s OK

        lblStatus.Text = "File Retrieved";

      }

    }

The code is very simple, the heart of it is the FileTransfer.GetFile static method. Two values are passed in, the integer ID, which is the primary key from the database, and the path to save the file to. Here I simply hard coded a path, in a real life application you will want to give the user the ability to enter a path. Let’s take a look at the GetFile routine.

    public static void GetFile(int ID, string outputPath)

    {

      // Setup database connection

      SqlConnection sqlConnection = new SqlConnection(

                "Integrated Security=true;server=(local)");

 

      SqlCommand sqlCommand = new SqlCommand();

      sqlCommand.Connection = sqlConnection;

 

      try

      {

        sqlConnection.Open();

 

        // Everything we do with FILESTREAM must always be in

        // the context of a transaction, so we’ll start with

        // creating one.

        SqlTransaction transaction

          = sqlConnection.BeginTransaction("mainTranaction");

        sqlCommand.Transaction = transaction;

 

        // The SQL gives us 3 values. First the PathName() method of

        // the Document field is called, we’ll need it to use the API

        // Second we call a special function that will tell us what

        // the context is for the current transaction, in this case

        // the "mainTransaction" we started above. Finally it gives

        // the name of the document, which the app will use when it

        // creates the document but is not strictly required as

        // part of the FILESTREAM.

        sqlCommand.CommandText

          = "SELECT Document.PathName()"

          + ", GET_FILESTREAM_TRANSACTION_CONTEXT() "

          + ", DocumentName "

          + "FROM FileStreamFTS.dbo.DocumentRepository "

          + "WHERE ID=@theID ";

 

        sqlCommand.Parameters.Add(

          "@theID", SqlDbType.Int).Value = ID;

 

        SqlDataReader reader = sqlCommand.ExecuteReader();

        if (reader.Read() == false)

        {

          throw new Exception("Unable to get BLOB data");

        }

 

        // OK we have some data, pull it out of the reader into locals

        string path = (string)reader[0];

        byte[] context = (byte[])reader[1];

        string outputFilename = (string)reader[2];

        int length = context.Length;

        reader.Close();

 

        // Now we need to use the API we declared at the top of this class

        // in order to get a handle.

        SafeFileHandle handle = OpenSqlFilestream(

          path

          , DESIRED_ACCESS_READ

          , SQL_FILESTREAM_OPEN_NO_FLAGS

          , context

          , (UInt32)length, 0);

 

        // Using the handle we just got, we can open up a stream from

        // the database.

        FileStream databaseStream = new FileStream(

          handle, FileAccess.Read);

 

        // This file stream will be used to copy the data to disk

        FileStream outputStream

          = File.Create(outputPath + outputFilename);

 

        // Setup a buffer to hold the streamed data

        int blockSize = 1024 * 512;

        byte[] buffer = new byte[blockSize];

 

        // There are two ways we could get the data. The simplest way

        // is to read the data, then immediately feed it to the output

        // stream using it’s Write feature (shown below, commented out.

        // The second way is to load the data into an array of bytes

        // (here implemented using the generic LIST). This would let

        // you manipulate the data in memory, then write it out (as

        // shown here), reupload it to another data stream, or do

        // something else entirely.

        // If you want to go the simple way, just remove all the

        // fileBytes lines and uncomment the outputStream line.

        List<byte> fileBytes = new List<byte>();

        int bytesRead = databaseStream.Read(buffer, 0, buffer.Length);

        while (bytesRead > 0)

        {

          bytesRead = databaseStream.Read(buffer, 0, buffer.Length);

          //outputStream.Write(buffer, 0, buffer.Length);

          foreach (byte b in buffer)

            fileBytes.Add(b);

        }

 

        // Write out what is in the LIST to disk

        foreach (byte b in fileBytes)

        {

          byte[] barr = new byte[1];

          barr[0] = b;

          outputStream.Write(barr, 0, 1);

        }

 

        // Close the stream from the databaseStream

        databaseStream.Close();

 

        // Write out the file

        outputStream.Close();

 

        // Finally we should commit the transaction.

        sqlCommand.Transaction.Commit();

      }

      catch (System.Exception ex)

      {

        MessageBox.Show(ex.ToString());

      }

      finally

      {

        sqlConnection.Close();

      }

      return;

 

    }

The routine kicks off by opening a connection, then establishing a transaction. Remember from the previous lesson that every time you work with a FILESTREAM it has to be in a transaction. Next we basically duplicate the SQL used in the previous lesson, returning the path name, transaction context, and document name. The only difference is we pass in the ID as a parameter. With that, just like with the previous example we call the OpenSqlFilestream API. Note a difference, in this example the second parameter is “DESIRED_ACCESS_READ” as opposed to the write access we indicated previosly.

Once we have the “handle” we can create a FileStream for reading from the database. In this example I loop through the file stream, loading the data into a LIST of bytes. Once in memory we are free to work with it as we need to. In this example I simply loop back through the generic List and write the data to the file stream we opened on the disk for writing. If all you are doing is writing, it would be somewhat more efficient to write the code like so:

        int bytesRead = databaseStream.Read(buffer, 0, buffer.Length);

        while (bytesRead > 0)

        {

          bytesRead = databaseStream.Read(buffer, 0, buffer.Length);

          outputStream.Write(buffer, 0, buffer.Length);

        }

 

        // Close the stream from the databaseStream

        databaseStream.Close();

I simply eliminate the local byte array and write the buffer directly to the disk. Either way, the remainder is simple, just closing all the streams, commiting the transaction and closing the database connection.

This concludes the series on how to use FILESTREAM, in future posts we look into how to do Full Text Search with FILESTREAM stored objects.

Accessing FILESTREAM Data From A Client .NET Application – Part 1 Uploading a File

The best way to work with documents in a database is via a .Net application. I created a simple Windows forms project to access the table I created in previous lessons. I named the application FileLoader, you can the entire project at the Code Gallery site http://code.msdn.microsoft.com/FileStreamFTS .

The interface is very simple:

image

As you can see there are two main functions, the upper half uploads a file to the SQL Server. The lower half displays the files already in the table, lets the user pick one and then click the GetFile button to save it locally. Today we’ll look at the Upload File functionality. Here is the code:

    private void btnUploadFile_Click(object sender, EventArgs e)

    {

      // Reset in case it was used previously

      lblStatus.Text = "";

 

      // Make sure user entered something

      if (txtFile.Text.Length == 0)

      {

        MessageBox.Show("Must supply a file name");

        lblStatus.Text = "Must supply file name";

        return;

      }

 

      // Make sure what user entered is valid

      FileInfo fi = new FileInfo(txtFile.Text);

      if (!fi.Exists)

      {

        MessageBox.Show("The file you entered does not exist.");

        lblStatus.Text = "The file you entered does not exist.";

        return;

      }

 

      // Upload the file to the database

      FileTransfer.UploadFile(txtFile.Text);

 

      // Refresh the datagrid to show the newly added file

      LoadDataGridView();

 

      // Let user know it was uploaded

      lblStatus.Text = fi.Name + " Uploaded";

    }

The real line of importance is the FileTransfer.UploadFile. This calls a static method in a class I named FileTransfer.cs. In order to use FILESTREAM there is an API call we have to make, so at the header area of the FileTransfer we have a lot of declarations. These are pretty much a straight copy from the MSDN help files.

    //These contants are passed to the OpenSqlFilestream()

    //API DesiredAccess parameter. They define the type

    //of BLOB access that is needed by the application.

 

    const UInt32 DESIRED_ACCESS_READ = 0×00000000;

    const UInt32 DESIRED_ACCESS_WRITE = 0×00000001;

    const UInt32 DESIRED_ACCESS_READWRITE = 0×00000002;

 

    //These contants are passed to the OpenSqlFilestream()

    //API OpenOptions parameter. They allow you to specify

    //how the application will access the FILESTREAM BLOB

    //data. If you do not want this ability, you can pass in

    //the value 0. In this code sample, the value 0 has

    //been defined as SQL_FILESTREAM_OPEN_NO_FLAGS.

 

    const UInt32 SQL_FILESTREAM_OPEN_NO_FLAGS = 0×00000000;

    const UInt32 SQL_FILESTREAM_OPEN_FLAG_ASYNC = 0×00000001;

    const UInt32 SQL_FILESTREAM_OPEN_FLAG_NO_BUFFERING = 0×00000002;

    const UInt32 SQL_FILESTREAM_OPEN_FLAG_NO_WRITE_THROUGH = 0×00000004;

    const UInt32 SQL_FILESTREAM_OPEN_FLAG_SEQUENTIAL_SCAN = 0×00000008;

    const UInt32 SQL_FILESTREAM_OPEN_FLAG_RANDOM_ACCESS = 0×00000010;

 

    //This structure defines the format of the final parameter to the

    //OpenSqlFilestream() API.

 

    //This statement imports the OpenSqlFilestream API so that it

    //can be called in the Main() method below.

    [DllImport("sqlncli10.dll", SetLastError = true, CharSet = CharSet.Unicode)]

    static extern SafeFileHandle OpenSqlFilestream(

                string Filestreamath,

                uint DesiredAccess,

                uint OpenOptions,

                byte[] FilestreamTransactionContext,

                uint FilestreamTransactionContextLength,

                Int64 AllocationSize);

 

    //This statement imports the Win32 API GetLastError().

    //This is necessary to check whether OpenSqlFilestream

    //succeeded in returning a valid / handle

 

    [DllImport("kernel32.dll", SetLastError = true)]

    static extern UInt32 GetLastError();

OK, with that out of the way, I’ve created a public, static method to upload the file. Here is the full routine:

    public static void UploadFile(string fileName)

    {

      // Establish db connection

      SqlConnection sqlConnection = new SqlConnection(

                "Integrated Security=true;server=(local)");

      SqlTransaction transaction = null;

 

      // Create a File Info object so you can easily get the

      // name and extenstion. As an alternative you could

      // choose to pass them in,  or use some other way

      // to extract the extension and name.

      FileInfo fi = new FileInfo(fileName);

 

      try

      {

        // Open the file as a stream

        FileStream sourceFile = new FileStream(fileName

          , FileMode.OpenOrCreate, FileAccess.Read);

 

        // Create the row in the database

        sqlConnection.Open();

 

        SqlCommand cmd = new SqlCommand();

        cmd.Connection = sqlConnection;

        cmd.CommandText = "INSERT INTO "

          + "FileStreamFTS.dbo.DocumentRepository"

          + "(DocumentExtension, DocumentName) VALUES (‘"

          + fi.Extension + "’, ‘"

          + fi.Name + "’)";

        cmd.ExecuteNonQuery();

 

        // Now upload the file. It must be done inside a transaction.

        transaction = sqlConnection.BeginTransaction("mainTranaction");

        cmd.Transaction = transaction;

        cmd.CommandText = "SELECT Document.PathName(), "

         + "GET_FILESTREAM_TRANSACTION_CONTEXT() "

         + "FROM FileStreamFTS.dbo.DocumentRepository "

         + "WHERE ID=(select max(id) from FileStreamFTS.dbo.DocumentRepository)";

        SqlDataReader rdr = cmd.ExecuteReader();

        if (rdr.Read() == false)

        {

          throw new Exception("Could not get file stream context");

        }

 

        // Get the path

        string path = (string)rdr[0];

        // Get a file stream context

        byte[] context = (byte[])rdr[1];

        int length = context.Length;

        rdr.Close();

 

        // Now use the API to get a reference (handle) to the filestream

        SafeFileHandle handle = OpenSqlFilestream(path

          , DESIRED_ACCESS_WRITE

          , SQL_FILESTREAM_OPEN_NO_FLAGS

          , context, (UInt32)length, 0);

 

        // Now create a true .Net filestream to the database

        // using the handle we got in the step above

        FileStream dbStream = new FileStream(handle, FileAccess.Write);

 

        // Setup a buffer to hold the data we read from disk

        int blocksize = 1024 * 512;

        byte[] buffer = new byte[blocksize];

 

        // Read from file and write to DB

        int bytesRead = sourceFile.Read(buffer, 0, buffer.Length);

        while (bytesRead > 0)

        {

          dbStream.Write(buffer, 0, buffer.Length);

          bytesRead = sourceFile.Read(buffer, 0, buffer.Length);

        }

 

        // Done reading, close all of our streams and commit the file

        dbStream.Close();

        sourceFile.Close();

        transaction.Commit();

 

      }

      catch (Exception e)

      {

        if (transaction != null)

        {

          transaction.Rollback();

        }

        throw e;

      }

      finally

      {

        sqlConnection.Close();

      }

 

    }

First we open a connection to the SQL Server, then create a FileInfo object to make it simple to extract the file name and extension. Next a record is inserted into the database that will act as a place holder. It has the name of the file and the extension, but no file yet. I did go ahead and open a FileStream to the source file, located on the disk. We’ll need this later to upload the file.

Next you will see that I begin a transaction. Every time you work with a FILESTREAM it must always be in the context of a transaction. After that a SQL Data Reader is created that has three pieces of information. First, it calls the PathName() function for the Document field in our table. The PathName() will be needed later when we call the API. The second field is returned from the GET_FILESTREAM_TRANSACTION_CONTEXT function, and returns the transaction context for the transaction. Note this is not the name (in this example “mainTransaction”), but the context which is a special value. These two values are then copied into local variables which will be used in calling the OpenSqlFilestream API. In this example I also retrieve the DocumentName field, this is used by the code when it writes the file to the database, but is not strictly needed for the FILESTREAM.

Next you will see the call to the OpenSqlFilestream API, which returns a “handle”. This handle is then used to create a FileStream object. Using this newly created FileStream (here named dbStream) we can then upload the file. Now the main work begins. After setting up a buffer, we then simply read from the source file stream into the buffer, then write the exact same buffer to the database FileStream. The loop continues until there are no more bytes in the source.

At this point we are essentially done. We close the streams, commit the transaction, and in the finally block close the SQL database connection. The file should now be in the database. I do want to point out one thing. In the SQL to get the information to the row just uploaded, I use a subquery to get the max(id), essentially returning the last row just inserted. This is fine for this simple example, when the database has just one user. In your production systems where you are likely to have many users, however, you should use an alternate method to return the row you need. Otherwise two users could insert rows at the same time, and thus a conflict could occur with both of them getting back the same max(id). It will not happen often, but at some point it could happen and be very hard to debug.

This handled the uploading of files to the SQL Server via FILESTREAM, in the next installment we’ll look at how to retrieve the file we just uploaded.

SQL Server Sample Data – The SQL Name Game

Like most folks, I seem to have a perpetual need for realistic test data. While there are many databases available, sometimes the need is quite simple. All I need is some names, perhaps dates and phone numbers that can be used for testing my applications, SSIS or SQL Server Reports. I decided to take care of this need once and for all, and set out with a simple goal. At the conclusion of my work I wanted to wind up with a realistic looking, but totally fake set of data. I wanted to do it in the simplest means possible, using whatever tools I had available. Finally, I wanted to do it as quickly as possible.

Along the way I documented my efforts, as well as created a sample table with 100,000 rows. When I started I thought to publish everything in a blog post, but it turned out to be far too much for a single blog post. Thus I decided to document everything in a white paper, and upload all the code to a MSDN Code Gallery site. Note that while I used the 2008 versions of SQL Server and Visual Studio, the SQL Scripts should run just fine with SQL Server 2005.

You can find everything at http://code.msdn.microsoft.com/SqlServerSampleData . Look in the downloads section for the complete PDF with all the details, as well as all of the sample data. Using the techniques outlined in the white paper you too could easily be generating your own test data for a wide variety of projects.

Tracking down SQL Server Integration Services issues with Collation

At work I’ve been developing a big suite of packages to convert data from an Oracle system into our Data Warehouse. A lot of the supporting tables are almost a straight pull, except we are of course adding our own primary key, and then setting up a non clustered unique index on what had been the primary key in the old system.

A few tables have been driving us batty though, giving us “duplicate value” errors when trying to insert the rows from our SSIS package. The first thing I did to try and track down the problem was create an error table, and instead of having the package fail have it redirect error rows to my new error table. In case you are wondering, this is going to be a “one time shot” use for the packages, so we chose not to invest a lot of time and effort into error handling. We either want all the rows or none, and we’ll be running the packages manually so we’ll be there to know the results. But I digress.

When I went to look at the error table, it had all the rows from our source system in it. I scratched my head, thinking that can’t be right. A quick search found the answer in the Technet Forums. I needed to go into the OLE DB Destination and set the Max Commit count to 1. Of course you wouldn’t want to leave it like that for production, but for debugging it worked great. Once I did that, I was able to rerun the package and quickly identify my misbehaving row.

Next I looked at the value, and then looked for a similar value in my table. What I found was my source system had two rows, something like this example:

Arcane Code

Arcane code

Yes, the only difference was the second row had a lowercase letter at the beginning of the second word. Our Oracle instance had case sensitivity turned on. To it, these were two entirely different values. However, by default SQL Server is case insensitive; to it these two were the same. So my dilemma, how to fix this one column without having to alter my entire database?

It turns out there is an option in the Create Table syntax to set the collation. First, you should find out what your collation is currently set to. This is easy enough, just open SQL Server Management Studio, right click on the database and pick properties. Right there on the front page is the Collation.

image

Alternatively I could have run this SQL in SSMS (substitute your database name where I have AdventureWorks2008):

select databasepropertyex('AdventureWorks2008', 'collation')

Either way, in this example the default is SQL_Latin1_General_CP1_CI_AS. The important thing to note is the “_CI_”, which indicates case insensitivity. If we wanted to set the entire database, we would issue commands to change this to SQL_Latin1_General_CP1_CS_AS, which stands for case sensitivity. But as I said, in my case I don’t want to affect the entire database, so instead I will use this collation name in the create table syntax. Here is a simple example:

create table TestTable
(
  BogusPK bigint identity
  , FieldFromOracle varchar(200) collate SQL_Latin1_General_CP1_CS_AS not null
  , AnotherField varchar(200) null
)

All that I had to do was insert the collate clause between the data type and the not null clause. Note that this only affects the one column I had an issue with. FieldFromOracle is now a case sensitive column, I can add “Arcane Code” and “Arcane code” and still be able to add a unique index. The second column, here named “AnotherField” will remain case insensitive, the behavior you normally expect.

Before I wrap this up, I know someone will point out that allowing primary keys in your system that only differ in case is bad practice. For the record I totally agree, however this is a soon to be legacy system built by a vendor. Additionally, for various reasons I was not allowed to do any data cleansing to the source system. Just pull it like it is and put it in the warehouse. I imagine most of you are like me, that you don’t get to live in the ideal world, so hopefully knowing how to diagnose and deal with collation issues between databases will make your life a little easier.

Using a Local Reporting Services 2008 Report with an ADO.NET Data Set

SQL Server Reporting Services is an incredibly full featured reporting tool. An often asked question though is “How can I use Reporting Services without setting up a full SQL Server just to run Reporting Services?”

Fortunately the folks at Microsoft thought of this, and created a version of Reporting Services that runs in Local, or what Microsoft calls Client mode. There are several ways to use client mode, you can bind the report right to the database if you wish, or to an object. However if you have an application that’s been around for a bit, or maybe you’ve been around for a bit, chances are you have a lot of ADO.NET DataSets you’d love to use for data in a report. Today we’ll look at how to bind those data sets to a SQL Server Reporting Services report.

Let’s say you have created an application to work with the AdventureWorks2008 database. You user has now asked you for one last feature. They wish to display a list of Vendors in a report. They want to preview the report, print it, and be able to export it to a PDF format.

Based on your experience you know that SQL Server Reporting Services would be a good choice. However your client does not have an instance of SQL Server Reporting Services running in their corporation. Thus the path is clear, use SQL Server Reporting Services in Client mode.

Preliminary Work

Prior to beginning our work, we’ll need to do two basic setup steps. First, if you don’t have it already, you will need to download the Adventure Works 2008 database from CodePlex. Install it following their instructions. Here is the current location for AdventureWorks2008:

http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=18407

Now open up Visual Studio 2008 and create a new C# WinForms application. Note that while the ReportViewer control we’ll be using works fine in WinForms, ASP.Net, or WPF, for simplicity we’ll use a WinForms application. Give your project a meaningful name (I used ReportingServicesLocal).

Create the DataSource

Normally Reporting Services knows what tables and columns are available because you have setup a connection to a database. In this scenario however, we are going to bind the report to an in memory ADO.NET DataTable.

At design time then Reporting Services does not know about the DataSet, and so we must create a surrogate for Reporting Services. Thus we’ll create a special type of XML schema definition to stand in for our not yet created DataSet.

To accomplish this, first we need to create the Data Source schema by following these steps:

1. Right click on the project in the Solution Explorer window.

2. Select Add, New Item.

3. Click on the Data leaf of the Visual C# Items branch in the Add New Item window.

4. Pick the DataSet item. Give it a meaningful name, such as VendorList.xsd.

clip_image002

Now we need to add a table to the DataSet.

1. In your toolbox, under DataSet find the Data Table tool and drag it onto the design surface.

2. Click the DataTable1 and rename it to Vendors.

The last step in the process is to add our columns to the Vendors DataTable we just created.

1. Right click on the name of your DataTable and pick Add, Column from the pop up menu.

2. For the first column type in VendorName. Note that if we needed to, we could now go to the properties window and change the DataType to something other than the default of System.String. For this lab, everything we’ll use is a string so this won’t be needed.

3. Repeat step 2, adding these column names: AddressLine1, AddressLine2, City, StateProvinceName, PostalCode

When done it should look like:

clip_image004

Create the Report.

Now that we have a schema, we’re ready to create the report and add the components to it. To create the report, follow these basic steps.

1. Right click on the project and select Add, New Item.

2. In the pop up window, go to the Reporting leaf under the Visual C# branch.

3. Pick “Report”, and give the report a meaningful name such as VendorList.rdlc.

clip_image006

Now that the report is created, we need to add the components and data columns to it.

1. With the blank report, drag a Table control onto the report body.

2. Open the Data Sources window by selecting Data, Show Data Sources from the Visual Studio menu.

3. You should see the VendorList DataSet, under it the Vendors DataTable, and under it the columns.

clip_image008

4. Drag the VendorName to the first column of the table. Next, drag the City to the second column, and the StateProvinceName to the third.

5. Right click on the column header for StateProvinceName and pick “Insert Column to the Right”.

6. Drag the PostalCode to this newly inserted column. Your report should now look something like:

clip_image010

Adding the Report Viewer to the Windows Form

Now that the setup tasks are complete, it’s time to get to the user interface ready. First we’ll do some basic setup of the form.

1. When you created the basic project Visual Studio created a default Windows Form, Form1.cs. Start by changing the Text property to read “Report Viewer”.

2. While we’re at it, let’s change the (Name) property to frmReportViewer.

Now add the Report Viewer control to the form.

1. In the toolbox, navigate to the Reporting area, and drag a MicrosoftReportViewer control onto the form. Resize so it takes up the lower 90% or so of the form.

2. Change the name to rvwMain (or something meaningful).

Next add a button to the form. We’ll use it to trigger the report.

1. From the Common Controls area of the toolbox, drag a button control onto the form.

2. Change the (Name) property to btnDataSet.

3. Change the Text property to DataSet.

4. Double Click on the button to open up it’s code behind.

We’ll be supplying the data to the ADO.Net dataset using SQL Server, so we need to go to the top of the form and add a reference to the System.Data.SqlClient.

using System.Data.SqlClient;

Now let’s go into the btnDataSet_Click event, and add some code to fill our dataset. This code snippet will bind to our local SQL Server, create a command to do a simple select statement to a view, and fill the dataset.

  /* Fill the Dataset ------------------------------------*/
  string qry = "select v.Name as VendorName "
                  + ", v.AddressLine1 "
                  + ", v.AddressLine2 "
                  + ", v.City "
                  + ", v.StateProvinceName "
                  + ", v.PostalCode "
               + "from Purchasing.vVendorWithAddresses v "
              + "order by v.Name ";

  string connectionstring = @"Server=(local);"
    + "Database=AdventureWorks2008;Trusted_Connection=True;";

  SqlConnection connection = new SqlConnection(connectionstring);
  SqlCommand cmd = new SqlCommand(qry, connection);

  SqlDataAdapter daVendor = new SqlDataAdapter();
  daVendor.SelectCommand = cmd;
  DataSet dsVendors = new DataSet();
  daVendor.Fill(dsVendors);

Note that in the first line of our select statement, we had to use v.Name as VendorName. The column names we return from our dataset must match the column names we entered in the Data Source back in Exercise 2 Step 3. Fortunately SQL easy to use “AS” syntax makes this simple.

Also, even though in this example we use SQL Server, the connection could be to any Data Source such as MySQL or Oracle. The important thing is we wind up with a DataSet to bind to.

In the same btnDataSet_Click method we now need to tell the report viewer control which report to run, then where to get it’s data. To tell the ReportViewer control to use a local report (as opposed to a report residing on a Reporting Services Server) we need to set the ReportEmbeddedResource property.

  rvwMain.LocalReport.ReportEmbeddedResource = "ReportingServicesLocal.VendorList.rdlc";

Note the format of the string we pass in. It has the name of the project, then a dot, then the name of the report complete with it’s rdlc extension. You should also know this is case sensitive.

Now we need to tell the report where our data really is. To do this, we’ll tell it to bind the Vendor data table from the VendorList data source to the dataset we generated in the step above.

  rvwMain.LocalReport.DataSources.Add(
    new Microsoft.Reporting.WinForms.ReportDataSource(
    "VendorList_Vendors", dsVendors.Tables[0]));

We need to create a new ReportDataSource to pass into the Add method of the reports DataSources. In the constructor for the ReportDataSource we pass in two parameters. The first is the name of the DataTable we are binding to.

Note that it’s syntax is a bit odd, you have to address it with first the DataSet name, then use an underscore to append the name of the specific DataTable.

The second parameter is the specific table from the dataset to bind to. Since we only had 1 we can use the simple .Tables[0] syntax shown here. We could have also given it a specific name.

One final note, in this simple example we are only binding one data source. However it’s possible for reports to have multiple data tables contained in them. To bind each one, we would simply have created a data table for each in the XSD, then added the code to the step above to read each one in, then bound them in this step by repeating this line of code for each one.

Finally we’re ready to display the report. Simply add this line to trigger the generation of the report.:

      rvwMain.RefreshReport();

Test your application.

Everything is now setup, you should be ready to run.

1. Launch the app from Visual Studio.

2. Once open, click on the DataSet button.

3. Your screen should look something like:

clip_image012

 

And there you go, you too can now easily create nice looking reports from your existing ADO.NET datasets.