I’m a SQL People

My friend and fellow MVP Andy Leonard (blog | twitter) has started an interesting project called SQL People. He’s interviewing people in the SQL Server community about their history, how they got into SQL Server, etc. My interview went online recently, you can read it at:

http://sqlpeople.net/post.aspx?postHeaderId=31

Deep Fried Arcane

At TechEd last year I was interviewed by the Deep Fried Bytes guys, along with another great SQL guy Denny Cherry. The topic of our interview was What Should Developers Know About SQL Server. (Click the link for the show.)

In the interview we cover SQL Server Full Text Search, SQL Server Service Broker, and SQL Server Integration Services. And if you listen, you’ll hear about my favorite deep fried food!

SSRS Quick Tip – An item with the same key has already been added

I was in the process of creating a new report in SQL Server Reporting Services today. I was loading my dataset from a stored procedure, and when I hit the “Refresh Fields” button I recieved the following error:

“Could not create a list of fields for the query. Verify that you can connect to the data source and that your query syntax is correct.”

When I clicked the details button I got this further information:

“An item with the same key has already been added.” Here’s a screen shot of my error.

Well this had me scratching my head, as I had made sure to run the stored procedure, and it executed with no errors. After doing some considerable research I finally found a question in the Technet forums that was tangentially related to the error. This gave me the clue to figure out what I had done.

In my stored procedure, I had inadvertantly included the same column name from two different tables. My query looked something like:

SELECT a.Field1, a.Field2, a.Field3, b.Field1, b.field99
FROM TableA a JOIN TableB b on a.Field1 = b.Field1

SQL handled it just fine, since I had prefixed each with an alias (table) name. But SSRS uses only the column name as the key, not table + column, so it was choking.

The fix was easy, either rename the second column, i.e. b.Field1 AS Field01 or just omit the field all together, which is what I did.

As it took me a while to figure this out, tought I’d pass it along to anyone else who might be looking.

SQL Server Integration Services for Developers

Today I presented SSIS For Developers, we looked at how SSIS, commonly used in Data Warehousing, can also be used by most developers to solve issues that frequently come up in the course of their job. Data conversion and exporting data are two good examples, and we also looked at how to call your new SSIS job from your .Net application.

There are two code demos used during the presentation, both available at my Code Gallery site. The first is the basic SSIS For Devs demo with the three packages. The second is the more complex example showing how to call SSIS from your .Net application.

Linked Subreports in SQL Server 2008 Reporting Services

Note, before getting started with this lesson there are some prerequisites you should know about. Please read my post Getting Started with SQL Server 2008 to ensure you have everything setup correctly, otherwise you will be missing objects required to get the code here to work correctly.

The previous lesson showed how to include a subreport into another report. This could be used to link independent reports together into a single report. It can also be useful to have a related subreport. A subreport whose data is driven by that of the main report. This can be accomplished by the use of paramenters.

For this lab we’ll create a subreport that returns category totals for the region passed in from the main report. Note that this is a greatly simplified example to illustrate the technique. Even though in this sample everything comes from the same database, each report could just as easily come from completely different data sources. Subreports would be a great technique for combining data from different systems.

Step 1. Create the subreport.

Use Contoso as the shared datasource. For the query, enter:

SELECT [Region]
     , [ProductCategoryName]
     , SUM([TotalAmount]) AS ProductTotal
  FROM [ContosoRetailDW].[Report].[V_SubcategoryRegionTotalsByYear]
 WHERE [Region] = @pRegion
 GROUP BY [Region], [ProductCategoryName]
 ORDER BY [Region], [ProductCategoryName]
Use a tabular report, move everything into the details area, Generic for the table style, and for the report name use “Subreport – Region Category Totals”.

Step 2. Cleanup the subreport.

Click the edge Region texbox in the header (so it’s selected instead of being edited), and press delete. Repeat with the [Region] textbox in the detail row. We won’t need it since Region will be displayed on the parent report.

Change the other headers to Category and Total. Make them wider, make what had been the Region column smaller but leave it, it will give a nice indent padding when included on the parent report. In the textbox properties for ProductTotal, make sure to set the Number to Currency, and in the Alignment area change the Horizontal to right align.

Remove the “Subreport – Region Category Totals” text box

Click on the main table grid, then move it to top of body. Collapse the body to fit the table.

Step 3. Add the parameter.

In the Report Data window, right click on Parameters and pick Add Parameter. Name the property Region. For the prompt, enter “Region – Hidden”. Since the prompt will never be visible, it really doesn’t matter, but making a habit of entering the name and the word Hidden will give a clear indicator that this parameter is a hidden one.

Leave the data type set to text, and check on “Allow blank value”. If you don’t, the report will error out when used as a subreport. Next, set the visibility to Hidden. This means it won’t appear if you run the report, but you can still pass in parameters, from another report or via a URL. Click OK to close the properties window.

Finally, we need to bind the parameter to the parameter the dataset needs. Right click on the dataset and go to properties. On the parameters area @pRegion should already be present (remember, it was part of the WHERE clause in the SQL query). Pick @Region in the drop down for Parameter Value.

Step 4. Create the main report.

Add a new report, using Contoso as the shared datasource. For the query, use:

SELECT [RegionCountryName]
  FROM [Report].[V_Regions]
Use a tabular report, move the RegionCountryName to the details area, and pick Corporate for the style. Finally, for the name use “Regional Report”.

Step 5. Layout the report.

Since there’s only one column, expand it to take up the width of the body.

Right click on the row selector (the gray box with the lines on the left of the table) and pick Insert Row->Inside Group Below.

Into that area, drag a Subreport control from the toolbox. Note in this case there is only one column, but if there were multiple cells you could highlight them, right click and pick Merge Cells.

Step 6. Setup the subreport.

Right click on the subreport control.

Under “Use this report as a subreport” select the “Subreport – Region Category Totals”. Under the parameters area, click Add. Select Region under Name, and for the Value select RegionCountryName.

Step 7. Preview the report

Preview the report to see your results:

Notes

Just a few notes. In this report, we left the table headers in the subreport (Category and Total). Often these are removed, to make the subreport blend in more with the parent.

Here only one parameter was passed, however you can pass multiple parameters if you need to.

Using Subreports as Areas of Your SQL Server Reporting Services Report

Subreports are an incredibly useful concept within Reporting Services. They allow you to compartmentalize complex logic. They also allow you to create reports that can be used in many different parent reports.

In this lab, we’ll look at how to create a subreport and use it as a region within a parent report. For this example, we’ll create a base report, then a subreport that will function as an executive summary which we can place at the top of the report body. These types of summaries are commonplace in the reporting world.

Let’s get started by creating our base report. This will be identical to the base report used in other labs.

Step 1. Add the main report

As with our other reports, right click on the Reports branch in Solution Explorer, pick Add New Report, and (if you haven’t already disabled it) click next to move past the welcome screen.

Step 2. Set the data source.

Pick the Contoso shared data source, or setup a new source to Contoso, and click Next.

Step 3. Setup the query.

In the query builder, we’ll be using one of our views. Enter this SQL statement:

SELECT [FiscalYear]

,[ProductCategoryName]

,[ProductSubcategory]

,[Region]

,[TotalAmount]

FROM [ContosoRetailDW].[Report].[V_SubcategoryRegionTotalsByYear]

and click next.

Step 4. Set the format.

For the report type we’ll use the simple Tabular format, so just click Next.

Step 5. Determine field placement in the report.

To keep this simple we’ll not use any groups on this report, so just put all report fields into the Details section. You can do it in one easy step by clicking on the top most item (FiscalYear), holding down the shift key, and clicking the bottom item (TotalAmount). This will select all of the fields, just click the Details button to move them. Then click Next.

Step 6. Select the formatting Style

Once again we’ll go with Corporate for the style and click Next.

Step 7. Name the report.

Finally we’ll give the report a name of “Regional Sales by Subcategory Subreports as Report Areas” and click Finish.

Step 8. Format report columns

To make the report a little easier to read, expand the width of the columns and format the Total Amount as Currency. (See the previous labs if you don’t recall how to accomplish this.)

Step 9. Add the subreport

It’s now time to create the subreport. Just like with a regular report, right click on the Reports branch in Solution Explorer, pick Add New Report, and (if you haven’t already disabled it) click next to move past the welcome screen.

Step 10. Set the data source.

Pick the Contoso shared data source, or setup a new source to Contoso, and click Next.

Step 11. Setup the query.

In the query builder, we’ll be using one of our views. Enter this SQL statement:

SELECT [ProductCategoryName]

,[CategoryTotal]

FROM [ContosoRetailDW].[Report].[V_ProdcutCategoryExecutiveSummary]

ORDER BY [ProductCategoryName]

and click next.

Step 12. Set the format.

For the report type we’ll use the simple Tabular format, so just click Next.

Step 13. Determine field placement in the report.

To keep this simple we’ll not use any groups on this report, so just put all report fields into the Details section.

Step 14. Select the formatting Style

Unlike other reports, we will pick Generic for the style and click Next.

Step 15. Name the report.

Finally we’ll give the report a name of “Subreport – Executive Summary” and click Finish. Note that is common to start the names of subreports with the name “Subreport” to make them easier to find.

Step 16. Format subreport columns and body

To make the report a little easier to read, expand the width of the columns and format the Total Amount as Currency. (See the previous labs if you don’t recall how to accomplish this.)

In addition, we don’t need the body to be any wider that what is needed. Click on the text box that has the body title “Subreport – Executive Summary” and shrink it to match the width of the table. Then hover the mouse over the right side of the report and drag it over to bump against the right side of the table.

Gotcha: If you try and shink the body first, it will not go. The right edge of the body can never be less than the right edge of the widest object (or the object whose right edge is farthest to the right).

Step 17. Setup the detail header

Start by changing the titles of the detail grid to “Product Category” and “Total”. Now highlight the entire row by clicking the gray row selector square to the left of the row.

We can change the fore and background color of this row to match those of the main report. You can pick from standard colors, or enter your own color value. As an example of the first, go to the Color property, and from the drop down pick the color white. You will see the property name change to “White”. You could also have chose to just type in the word White.

You can also enter a hexadecimal value for the color. Click on the Background Color property and enter “#1c3a70”. (No quotes, but make sure to include the # so the entered value will be understood as hex and not a standard color, such as “White”.

Note that you can also change the values of each textbox independently, using the same technique. Most commonly though you will want to set the entire row.

Step 18. The “Green Bar” effect

Once upon a time, in a computer room in the distant past, all reports were printed on paper that had alternating blocks of green and white background. This was known as “Green Bar” paper. The color made it easy to follow long lines of text across the page.

It’s possible to setup the same effect within our report today. Highlight the detail row, then in the Background Color of the properties window, click the drop down, then instead of a color pick the Expression option. For the expression, enter:

=iif((RowNumber(NOTHING) MOD 2) = 0, "LightBlue", "White")

Using the MOD function we determine if it’s an odd or even row, and set the background color accordingly. For the colors any color constant or hexadecimal value would work.

Step 19. Add a value for the body header.

When a report is used as a subreport, any headers or footers are ignored. It can be useful to have a nice title though, so in this step we’ll create one.

19.1                Hover over the bottom of the body, and drag it down to expand the body height.

19.2                Now click on the grid. When the grid row/column bars appear, click on the one in the very upper left corner. When you do, the row/column bars hide themselves, and the grid sizing handles appear. In the upper left is an icon that points up/down/left/right. Click on it and drag the grid down, leaving space at the top for a textbox. Also leave a little space at the bottom that can serve as a gap between it and other items that might appear on the main report we place this subreport on.

19.3                Next drag a textbox from the toolbox onto the top of the page. Expand the textbox to take up the width of the body. Increase the font size to 12, make the font bold, and center it.

19.4                We have a place now to put our title, lets grab some data to put there. Add a new dataset by right clicking on the Contoso data connection in the Report Data window.

19.5                Name it “CurrentFiscalYear”, for the query text enter:

SELECT MAX(FiscalYear) AS CurrentYear

   FROM [Report].[V_ProdcutCategoryExecutiveSummary]

Click OK to save this new dataset.

19.6                Returning to the textbox, right click and pick Expression. For the expression text, enter:

="Executive Summary for " & Sum(Fields!CurrentYear.Value, "CurrentFiscalYear")

To build the center part of the string, click on the Datasets option under category. Then click on the CurrentFiscalYear dataset. In the Values area, one item appears, Sum(CurrentYear). Click on it to add the text to the current expression.

There is an oddity with getting fields from other datasets then the main one that supplies data to the body, they must be an aggregate expression such as Sum. However, since we are SUMing one value, the subreport will look like.

Design Mode Preview Mode

Step 20. Add subreport to main report.

Adding the subreport is quite simple. First, expand the body to make room above the grid similar to what was done in the above step. Then, drag the subreport from the Solution Explorer onto a blank area of the body.

Positioning it can be a bit of a pain, there’s no nice “put in the center” button. But with a little math you can accomplish it.

Return to the subreport a moment, and click on the grid which should take up the entire width of the body. In the properties window, expand the Size property to see the width. In this case it’s 2.3 inches.

Back in the main report, repeating the procedure with the main report’s grid, we see the width is 6.58 inches. Now it’s easy, (6.58 – 2.3) / 2 yields 2.14 inches. Use this for the left property of the subreport. The width isn’t that important, just set it wide in this case.

Step 21. Preview the report.

As you see, you now have an attractive subreport that you can reuse in multiple reports.

Report Headers and Footers

A common feature to most reports are headers and footers that describe the report, and supply additional information such as the page numbering or print date. In this lab we’ll look at ways to customize the header and footer.

We’ll start by creating a basic report, then adding the headers and footers to it.

Step 1. Add the report

As with our other reports, right click on the Reports branch in Solution Explorer, pick Add New Report, and (if you haven’t already disabled it) click next to move past the welcome screen.

Step 2. Set the data source.

Pick the Contoso shared data source, or setup a new source to Contoso, and click Next.

Step 3. Setup the query.

In the query builder, we’ll be using one of our views. Enter this SQL statement:

SELECT [FiscalYear]
      ,[ProductCategoryName]
      ,[ProductSubcategory]
      ,[Region]
      ,[TotalAmount]
FROM [ContosoRetailDW].[Report].[V_SubcategoryRegionTotalsByYear]

and click next.

Step 4. Set the format.

For the report type we’ll use the simple Tabular format, so just click Next.

Step 5. Determine field placement in the report.

To keep this simple we’ll not use any groups on this report, so just put all report fields into the Details section. You can do it in one easy step by clicking on the top most item (FiscalYear), holding down the shift key, and clicking the bottom item (TotalAmount). This will select all of the fields, just click the Details button to move them. Then click Next.

Step 6. Select the formatting Style

Once again we’ll go with Corporate for the style and click Next.

Step 7. Name the report.

Finally we’ll give the report a name of “Regional Sales by Subcategory Headers and Footers” and click Finish.

Step 8. Format report columns

To make the report a little easier to read, expand the width of the columns and format the Total Amount as Currency. (See the previous labs if you don’t recall how to accomplish this.)

Previewing the report shows our data. There’s a lot of it, so let’s say we are the sales manager and we want to apply filters so we are only looking at pieces of our sales.

Step 9. Add a header area.

To add a header area to the report, simply right click anywhere outside the report body and select “Add Page Header”.

Step 10. Add a title.

A blank, white canvas should appear above your report body. Here you can create a header. Go to the toolbox, and drag in a Text Box. In it enter “Regional Sales Report”. Click on the text box and grab the sizing handles to enlarge it. Sometimes this can be a little tricky, if you click inside the text box it assumes you want to enter or edit the text and puts you in edit mode. You have to click right on the edge of the text box area to make the sizing handles appear.

Now add some visual impact. Either right click to access the fonts or use the toolbar above the design area. Make the font bold, and bump it up a few sizes, 16 generally works well.

Step 11. Add page numbers.

Drag another text box into the area. This time instead of static text we’ll use an expression to put in page numbers. Position the text box in the upper right corner of the report.

Right click on the text box, and in the pop up menu pick “Expression”.

In the expression builder you have a blank slate, only the beginning = is supplied for you. Similar to Excel, all expressions must start with an = sign.

The expression builder is very full featured and powerful, you can do a lot of complex things with it. It uses a VB.Net like language. In this lab though we’ll do something similar and concatenate some static text and build in variables to form a Page x of xx expression.

After the = sign enter “Page “ then an ampersand “&”. Page is simply static text, and the & will be used to join together our return value.

In the lower half of the Expression dialog you will see a Category and Item area, these are designed to make it easier to build expressions. Click on the “Built in Fields” Category. On the right the Item area will populate with the valid fields. Click on PageNumber.

Return to the upper area where it says “Set expression for Value” and after the page number type in & “ of “ & . Then go back to the Item list and click TotalPages. Your Expression dialog should now look like:

Click OK to close the Expression builder.

Step 12. Format the page number.

Select the text box for the page number by clicking on the edge, then using the toolbar right align the page number box. Page numbers are typically quite small on the header, so let’s bump down the font to 8 point.

Step 13. Resize the header.

In this example our header isn’t very large, but when we added it SSRS gave us a considerable amount of space. Let’s resize this to something more appropriate.

Hover over the dotted line between the header and report body with your mouse. It should turn into the up/down sizing handle. When it does, click and drag it up.

As an alternative, you could click in the empty area of the header, then in the Properties pane of VS/BIDS enter an explicit Height value. This is useful for situations where you have specific requirements that the header must be of an exact size. This often occurs with things like pre-printed forms or paper with the letterhead already printed on it.

Step 14. Preview the header.

OK, all done with this part. Switch to the Preview tab to see the header in action.

Step 15. Add the footer.

Working with footers are identical to working with headers. Start by right clicking in an empty spot in the design area and pick “Add Page Footer”.

Step 16. Add content.

Drag a text box onto the footer. Expand it to take up the entire width of the report, then enter the Expression dialog as you did before, right click and pick Expression from the menu.

It’s common for a business to want to copyright their intellectual property, so enter this as your expression:

="Copyright " & Year(Now()) & " ArcaneCode."

Hint: If you select Common Functions, Date & Time in the Category area of the Expression builder, you’ll see many common functions. When you click on one helpful hints will appear to the right.

Since we have a lot of unused space, we’ll again shrink the footer like we did the header. This time though hover over the bottom of the footer to make the resizing mouse icon appear, then drag it up to shrink it.

Step 17. Test in the Preview pane.

Once again, return to the Preview tab, scroll down and the footer should look something like:

Other ideas.

The things you can do in the header and footers are nearly infinite. Images, such as your corporate logo can be used. Trademarks, warning notices of intellectual property, print dates, the report name and URL, and the list of parameters used to generate the report are all common things that may appear in the header.

What I learned at TechEd

Last week I was at the Microsoft TechEd conference in North America, along with over 10,000 of my closest friends. I spent a lot of time in the Microsoft floor area talking to people, and came away with some interesting info about new technologies. As I’m sharing some of these at the Steel City SQL user group tonight, I thought I’d share here too.

First up is OData, the Open Data Protocol from Microsoft. It is an ATOM feed but for data. People can publish under the OData format and be able to consume the data from either a JSON or AtomPub. You can also add security, should you wish to have data available to many consumers but only on a permission basis. You can learn more at http://www.odata.org

Next up is Microsoft’s new “Dallas” project. Dallas is the code name for a data marketplace on it’s Azure platform. Through Dallas users and vendors will be able to consume / provide data feeds. Some will be free, others will be at some cost. There is a catalog through which consumers can look at the various feeds available. This is very much in it’s infancy but there are a few feeds which you can look at and preview.

Microsoft’s SQL Server 2008 R2 Parallel Data Warehouse looked interesting, although it fits a very niche market. It’s an appliance you can purchase that is essentially a rack of SQL Servers. One is the master server, and coordinates all the child servers. As a DBA you manage what appears to be a “normal” instance of a SQL Server. Behind the scenes the controller will propagate changes to the other servers in it’s hub. Scaling can be achieved by simply adding more servers to the existing rack, or additional racks as needed. PDW becomes economical starting around 10 terabytes and scales to well over 100 terabytes of data.

The folks at Red Gate have a new tool called SQL Search that they have released for free to the community. SQL Search is an add-on for SQL Server Management Studio that does lightening fast searches of object names in your database. Just pick the database name and term to search for and SQL Search will populate a grid with all possible matches. If you double click on the row it will navigate SSMS’s Object Explorer pane to the correct spot in the navigation tree with your object. Further, if the object is a view, stored proc, etc it will even display the SQL of the object and highlight the searched item. And did I mention it’s free?

Speaking of cool, free tools the folks at Confio have created a free version of their popular Ignite tool called IgniteFree. It is a real time performance monitoring tool that will work with not just SQL Server but Oracle and DB2 as well. They have versions of the tool that run on both Windows and Unix/Linux.

PowerPivot continues to fascinate and excite me, while I was at TechEd I won a copy of “PowerPivot for Excel and SharePoint”. I had this on my “to buy” list anyway so considered myself lucky. I’m about a sixth of the way through the book and it has been really good so far. It starts with a quick tour of the Excel piece, then walks you through the SharePoint install so you can quickly get up and running in a test environment. Later chapters delve much more deeply into PowerPivot. If you are looking for a good PowerPivot book I would recommend it.

Finally, even if you couldn’t be there you can watch the sessions from this and past Tech Ed’s. Microsoft has released them to the general public at http://www.msteched.com/

*FTC Discloser, I am in the “Friends of Red Gate” program where I get copies of their tools in order to test and provide feedback. In this case the disclaimer probably isn’t necessary since the SQL Search tool is freely available to all, but I’d prefer to keep things above board.

SQL Saturday 41 – Atlanta

Today is April 24, 2010 and I’m in Atlanta speaking at SQL Saturday number 41. I’m giving three sessions today. I guess I’m just a glutton for punishment, LOL.

My first session is an Introduction to Business Intelligence / Data Warehousing. In it I am covering the basics, it’s a true introductory talk where we’ll demystify all the buzz words surrounding Business Intelligence. You can download the slides from here.

My next session is Off and Running with PowerPivot for Excel 2010. Learn the ins and outs of this exciting new tool from Microsoft, see how you can enable your users to do their own Business Intelligence. The slides are ready from this location.

OK, an update before this blog entry even posts, Vidas Matelis just published his step by step guide for getting SharePoint 2010, SQL Server 2008 R2, and PowerPivot all up and going on a single box. (And when I say just, I mean it went up just as I was typing up this post.) Vidas knows a lot about PowerPivot, it’s a great blog to add to your short list. I have a link to his blog in my slide deck, but wanted to pass along a specific link to his install guide, you can find it at http://powerpivot-info.com/post/66-step-by-step-guide-on-installing-powerpivot-for-sharepoint .

The final session I’ll be doing is on Full Text Searching. You can download the code samples and slides from my Code Gallery site, http://code.msdn.microsoft.com/SqlServerFTS.

Speaking of Full Text Search, I’ll be doing an Interactive Session at Tech-Ed in New Orleans on Full Text Searching. The session is now in the catalog: http://northamerica.msteched.com/topic/list?keyword=DAT07-INT If you are coming to New Orleans for Tech Ed I’d love to see you there. I’ll also be in the Microsoft Data booth during part of the event, so come on by and say Hi!

I hope to be able to sneak in a few sessions today as well, there will be 49 different sessions at SQL Saturday #41 to pick from (7 tracks, and 7 sessions per track) so it promises to have something for everyone. If you want to follow the fun on Twitter, our official hash tag is #sqlsat41 .

Live Streaming from SQL Saturday 41

One of the sponsors for tomorrow’s SQL Saturday in Atlanta Georgia, a company named Set Focus, is going to be live streaming three presentations from the event. I just got the word that my session, "Introduction to Data Warehousing/Business Intelligence" was selected as one of the sessions. My session kicks off the event at 8:30 a.m. Eastern time. Information and a link to the stream site can be found on Set Focus’s blog:

http://blogs.setfocus.com/radar/2010/04/22/streaming-sqlsaturday/

SQL Saturday 41 was sold out some time ago, and there is even quite a waiting list, so if you’re unable to attend then at least you can sit in on three of the sessions via the live stream. The other two sessions to be streamed are "SQL Server Memory Deep Dive" by Kevin Boles and "Database Design Patterns" by Louis Davidson. Both are fellow Microsoft MVPs and excellent presenters, I know you’ll enjoy their presentations as well.

Live streaming technology really excites me. While I feel that you can get the best experience and education from being live at the event, I also understand that this is not always possible for everyone. Work conflicts, distance, family obligations, or the event simply being sold out, as this one is, can limit a person’s ability to attend in person. Live streaming events such as SQL Saturday really helps us to extend our reach into the community and to help serve those who for whatever reason cannot be with us at the event. I want to give a great big thanks to the folks over at Set Focus for making this happen.

Introducing Microsoft PowerPivot

What is PowerPivot? Well according to Microsoft:

“PowerPivot is Microsoft Self-Service Business Intelligence”

I can see from the glazed looks you are giving your monitor that was clear as mud. So let’s step back a bit and first define what exactly is Business Intelligence.

Business Intelligence

Business Intelligence, often referred to as simply “BI”, is all about taking data you already have and making sense of it. Being able to take that information and turn it from a raw jumble of individual facts and transform it into knowledge that you can take informed actions on.

In every organization there is already someone who is doing BI, although they may not realize it. Microsoft (and many IT departments) refer to this person as “that guy”. A power user, who grabs data from anyplace he (or she) can get it, then uses tools like Excel or Access to slice it, dice it, and analyze it. This person might be an actual Business Analyst, but more often it’s someone for who BI is not their main job. Some common examples of people doing their own BI today are production managers, accountants, engineers, or sales managers, all who need information to better do their job. Let’s look at an illustration that will make it a bit clearer.

In this example, put yourself in the role of a sales manager. You have gotten IT to extract all of your sales orders for the last several years into an Excel spreadsheet. In order to determine how well your sales people are doing, you need to measure their performance. You’ve decided that the amount sold will be a good measure, and use Excel to give you totals.

In BI terms, the column “Total Sales” is known as a measure, or sometimes a fact, as it measures something, in this case the sales amount. The grand total sales amount is often called an aggregation, as it totals up the individual rows of data that IT gave us. But now you might be wondering why Andy’s sales are so low? Well, now you want to dig deeper and look at sales by year.

In BI terms, the names of the sales people are a dimension. Dimensions are often either a “who” (who sold stuff) or a “what” (what stuff did we sell). Places (where was it sold) and dates (when was it sold) are also common dimensions. In this case the sales dates across the top (2007, 2008, 2009) are a date dimension. When we use two or more dimensions to look at our measures, we have a pivot table.

Now we can see a picture emerging. It’s obvious that Andy must have been hired as a new salesperson in late 2008, since he shows no sales for 2007 and very small amount in 2008. But for Paul and Kimberly we can look at something called trends in the BI world. Kimberly shows a nice even trend, rising slowly over the last three years and earns a gold star as our top performer.

By being able to drill down into our data, we spot another trend that was not readily obvious when just looking at the grand totals. Paul has been trending downward so fast the speed of light looks slow. Clearly then we now have information to take action on, commonly known as actionable intelligence.

So remind me, why do we need PowerPivot?

As you can see in the above example, “that guy” in your company clearly has a need to look at this data in order to do his job. Not only does he need to review it, he also has the issue of how to share this information with his co-workers. Unfortunately in the past the tools available to “that guy” have had some drawbacks. The two main tools used by our analyst have been either Excel, or a complete BI solution involving a data warehouse and SQL Server Analysis Services.

Excel’s main limitations center around the volume of data needed to do good analysis. Excel has limits to the number of rows it can store, and for large datasets a spreadsheet can consume equally large amounts of disk space. This makes the spreadsheet difficult to share with coworkers. In addition mathematical functions like aggregations could be slow. On the good side, Excel is readily available to most workers, and a solution can be put together fairly quickly.

A full blown BI solution has some major benefits over the Excel solution. A data warehouse is created, and then SQL Server Analysis Services (often abbreviated as SSAS) is used to precalculate aggregations for every possible way an analyst might wish to look at them. The data is then very easy to share via tools like Excel and SQL Server Reporting Services. While very robust and powerful solution, it does have some drawbacks. It can take quite a bit of time to design, code, and implement both the data warehouse and the analysis services pieces of the solution. In addition it can also be expensive for IT to implement such a system.

Faster than a speeding bullet, more powerful than a locomotive, it’s PowerPivot!

PowerPivot combines the best of both worlds. In fact, it’s not one tool but two: PowerPivot for Microsoft Excel 2010, and PowerPivot for SharePoint 2010. What’s the difference you ask? Good question.

PowerPivot for Microsoft Excel 2010

PowerPivot acts as an Add-on for Excel 2010, and in many ways is quite revolutionary. First, it brings the full power of SQL Server Analysis Services right into Excel. All of the speed and power of SSAS is available right on your desktop. Second, it uses a compression technology that allows vast amounts of data to be saved in a minimal amount of space. Millions of rows of data can now be stored, sorted, and aggregated in a reasonable amount of disk space with great speed.

PowerPivot can draw its data from a wide variety of sources. As you might expect, it can pull from almost any database. Additionally it can draw data from news feeds, SQL Server Reporting Services, other Excel sheets, it can even be typed in manually if need be.

Another issue that often faces the business analyst is the freshness of the data. The information is only as good as the date it was last imported into Excel. Traditionally “that guy” only got extracts of the database as IT had time, since it was often a time consuming process. PowerPivot addresses this through its linked tables feature. PowerPivot will remember where your data came from, and with one simple button click can refresh the spreadsheet with the latest information.

Because PowerPivot sits inside Microsoft Excel, it not only can create basic pivot tables but has all the full featured functionality of Excel at its disposal. It can format pivot tables in a wide array of styles, create pivot charts and graphs, and combine these together into useful dashboards. Additionally PowerPivot has a rich set of mathematical functionally, combining the existing functions already in Excel with an additional set of functions called Data Analysis eXpressions or DAX.

PowerPivot for SharePoint 2010

PowerPivot for Excel 2010 clearly solves several issues around the issue of analysis. It allows users to quickly create spreadsheets, pivot tables, charts, and more in a compact amount of space. If you recall though, creation was only half of “that guys” problem. The other half was sharing his analysis with the rest of his organization. That’s where PowerPivot for SharePoint 2010 comes into play.

Placing a PowerPivot Excel workbook in SharePoint 2010 not only enables traditional file sharing, but also activates several additional features. First, the spreadsheet is hosted right in the web browser. Thus users who might not have made the transition to Excel 2010 can still use the PowerPivot created workbook, slicing and filtering the data to get the information they require.

Data can also be refreshed on an automated, scheduled basis. This ensures the data is always up to date when doing analysis. Dashboards can also be created from the contents of a worksheet and displayed in SharePoint. Finally these PowerPivot created worksheets can be used as data sources for such tools as SQL Server Reporting Services.

Limitations

First, let me preface this by saying as of this writing all of the components are either in CTP (Community Technology Preview, a pre-beta) or Beta state. Thus there could be some changes between now and their final release next year.

To use the PowerPivot for Excel 2010 components, all you have to have is Excel 2010 and the PowerPivot add-in. If you want to share the workbook and get all the rich functionality SharePoint has to offer, you’ll have to have SharePoint 2010, running Excel Services and PowerPivot 2010 Services. You’ll also have to have SQL Server 2008 R2 Analysis Services running on the SharePoint 2010 box. Since you’ll have to have a SQL Server instance installed to support SharePoint this is not a huge limitation, especially since SSAS comes with SQL Server at no extra cost.

One thing I wish to make clear, SharePoint 2010 itself can run using any version of SQL Server from SQL Server 2005 on. It is the PowerPivot service that requires 2008 R2 Analysis Services.

One other important item to note: at some point the load upon the SharePoint 2010 server may grow too large if especially complex analysis is being done. Fortunately SharePoint 2010 ships with several tools that allow administrators to monitor the load and plan accordingly. At the point where the load is too big, it is a clear indication it’s time to transition from a PowerPivot solution to a full BI solution using a data warehouse and SQL Server Analysis Services.

What does PowerPivot mean for business users?

For business users, and especially “that guy”, it means complex analysis tools can be created in a short amount of time. Rich functionality makes it easier to spot trends and produce meaningful charts and graphs. It also means this information can be shared with others in the organization easily, without imposing large burdens on the corporate e-mail system or local file sharing mechanisms.

No longer will users be dependent on IT for their analysis, they will have the power to create everything they need on their own, truly bringing “self service BI” to fruition.

What does PowerPivot mean for Business Intelligence IT Pros?

The first reaction many BI developers have when hearing about PowerPivot is “oh no, this is going to put me out of a job!” Far from it, I firmly believe PowerPivot will create even more work for BI Professionals like myself.

As upper management grows to rely on the information provided by PowerPivot, they will also begin to understand the true value BI can bring to an organization. Selling a new BI solution into an organization where none currently exists can be difficult, as it can be hard to visualize how such a solution would work and the value it brings. PowerPivot allows BI functionality to be brought into an organization at a low development cost, proving the value of BI with minimal investment. Thus when there is a need to implement a larger, traditional BI project those same managers will be more forthcoming with the dollars.

Second, as users pull more and more data, they are going to want that data better organized than they will find in their current transactional business systems. This will in turn spur the need to create many new data warehouses. Likewise the IT department will also want data warehouses created, to reduce the load placed on those same transactional business systems.

I also foresee PowerPivot being used by BI Pros themselves to create solutions. The database structure of many transactional database systems can be difficult to understand even for experienced IT people, much less users. BI Pros can use PowerPivot to add a layer of abstraction between the database and the users, allowing business analysts to do their job without having to learn the complexity of a database system.

BI Pros can also use PowerPivot to implement quick turnaround solutions for customers, bringing more value for the customer’s dollar. When a BI Pro can prove him (or her) self by providing rich functionality in a short time frame it’s almost always the case they are brought back in for multiple engagements.

PowerPivot also provides great value to BI Pros who are employed full time in an enterprise organization. They can create solutions much quicker than before, freeing them up to do other valuable tasks. In addition PowerPivot solutions can provide a “stop gap” solution, pushing the date at which the organization needs to spend the dollars for a full blown BI solution and allowing IT to plan better.

Finally I see great value in PowerPivot as a prototyping tool for larger BI projects. Now users can see their data, interact with it, analyze it, and ensure the required measures and dimensions are present before proceeding with the larger project.

I’ll reiterate, if anything I believe PowerPivot will create an explosion of work for the Business Intelligence Professional.

Where can I learn more?

Well right here for one. I have become quite interested in PowerPivot since seeing it at the SQL PASS 2009 Summit. I think it will be a valuable tool for both myself and my customers. This will be the first of many blog posts to come on PowerPivot. I am also beginning a series of presentations on PowerPivot for local user groups and code camp events. The first will be Saturday, November 21^st 2009 at the SharePoint Saturday in Birmingham Alabama, but there will be many more to come. (If you’d like me to come speak at your group just shoot me an e-mail and we’ll see what we can arrange.)

There’s also the PowerPivot site itself:

http://powerpivot.com/

I’ve also found a small handful of blogs on PowerPivot, listed in no particular order:

Summary

Thanks for sticking with me, I know this was a rather long blog post but PowerPivot has a lot of rich functionality to offer. While PowerPivot is still in the CTP/Beta stage as of this writing, I see more and more interest in the community, which will continue to grow as PowerPivot moves closer to release. I hope this post has set you off on the right step and you’ll continue to come back for more information.

Full Text Searching a FILESTREAM VARBINARY (MAX) Column

In the past I’ve written that Full Text Searching has the ability to index documents stored in a VARBINARY(MAX) field. However, I have never really gone into any details on how to do this. Today I will remedy that by demonstrating how to Full Text Seach not only using a VARBINARY(MAX) field, but one that has been stored using FILESTREAM. Even though these examples will be done against the data we’ve stored with FILESTREAM over the lessons from the last few days, know that this technique is identical for binary objects stored in a VARBINARY(MAX) field without using FILESTREAM.

Let’s start by creating a catalog to hold our Full Text data.

CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT;

Pretty normal, now we need to create a full text index on the “DocumentRepository” table we created in this series. When you look at the syntax though, you may notice a minor difference from the CREATE FULLTEXT INDEX examples I’ve shown in the past:

CREATE FULLTEXT INDEX ON dbo.DocumentRepository

(DocumentName, Document TYPE COLUMN DocumentExtension)

KEY INDEX PK__Document__3214EC277F60ED59

ON FileStreamFTSCatalog

WITH CHANGE_TRACKING AUTO;

Here you can see I am indexing two fields. The first is the “DocumentName”, which is passed in as the first parameter and looks like other examples. We won’t actually be using it in this example, however I included it to demonstrate you can index multiple columns even when one of them is a VARBINARY(MAX) column.

The second parameter indexes the VARBINARY(MAX) “Document” column itself, but notice the TYPE COLUMN after the column name. In order to Full Text Index a VARBINARY(MAX) column you must also have a column with the file extension in it. You then pass in the name of column after the TYPE COLUMN. In this example, the document extension is stored in the “DocumentExtension” column. Since the document extension can be stored in a column with any name, we let the Full Text engine know which column by passing it in after the TYPE COLUMN keyword. The remainder of the command is like other examples I’ve shown in the past.

Now we can run a normal SELECT…CONTAINS query against the “Document” field.

SELECT ID, DocumentName

FROM dbo.DocumentRepository

WHERE CONTAINS(Document, 'Shrew');

I’ll leave it to you to run, for me it returned one row, with “TheTamingOfTheShrew.doc”. If you want to try it again, use “Elinor”, and you should get back “KingJohn.doc”.

As you can see, performing a Full Text Search against a VARBINARY(MAX) column is quite easy, all you have to do is indicate the document type by using the TYPE COLUMN. There are two more things you should know. First, the column containing the document extension must be of type CHAR, NCHAR, VARCHAR, or NVARCHAR. Second, the document type must be recognized by SQL Server. To get a list of all valid document types, simply query the fulltext_document_types catalog view like so:

SELECT * FROM sys.fulltext_document_types;

This will give you a list of all file extensions understood by SQL Server. Each row actually represents a filter. Each filter represents a DLL that implements the IFilter interface. It is possible to add additional filters to the system. For example, Microsoft offers the “Microsoft Filter Pack”. You may have noticed that out of the box SQL Server 2008 supports the older Office 2003 documents, but not the more recent Office 2007 formats. To add these newer formats to your SQL Server, Microsoft provides the afore mentioned filter pack. While installing it is beyond the scope of this aritcle you can find complete instructions for downloand and installation at http://support.microsoft.com/default.aspx?scid=kb;en-us;945934 .

The Full Text Search features provided by SQL Server continue to amaze me with how powerful they are, yet how easy they are to implment. With the information here you can easily search through documents stored in a VARBINARY(MAX) field, even when those documents are actually stored via the new SQL Server 2008 FILESTREAM.

Accessing FILESTREAM Data From A Client .NET Application – Part 2 Downloading a File

In the previous entry we covered how to upload a file to SQL Server using the FILESTREAM, new to SQL Server 2008. In this post we will look at retrieving a file from SQL Server using FILESTREAM. If you missed yesterday’s installment, a simple front end was created, the full project can be found at the Code Gallery site http://code.msdn.microsoft.com/FileStreamFTS .

The interface is very simple:

The grid is a Data View Grid that shows the ID and Document information from the table we previously created. (If you want to see the code to populate the grid see the project.) The user picks a row, then clicks on the Get File button.

    private void btnGetFile_Click(object sender, EventArgs e)
    {
      // Reset in case it was used previously
      lblStatus.Text = "";
 
      if (dgvFiles.CurrentRow != null)
      {
        // Grab the ID (Primary Key) for the current row
        int ID = (int)dgvFiles.CurrentRow.Cells[0].Value;
        // Now Save the file to the folder passed in the second
        // paramter. 
        FileTransfer.GetFile2(ID, @"D:\Docs\Output\");
        // And let user know it’s OK 
        lblStatus.Text = "File Retrieved";
      }
    }

The code is very simple, the heart of it is the FileTransfer.GetFile static method. Two values are passed in, the integer ID, which is the primary key from the database, and the path to save the file to. Here I simply hard coded a path, in a real life application you will want to give the user the ability to enter a path. Let’s take a look at the GetFile routine.

    public static void GetFile(int ID, string outputPath)
    {
      // Setup database connection
      SqlConnection sqlConnection = new SqlConnection(
                "Integrated Security=true;server=(local)");
 
      SqlCommand sqlCommand = new SqlCommand();
      sqlCommand.Connection = sqlConnection;
 
      try
      {
        sqlConnection.Open();
 
        // Everything we do with FILESTREAM must always be in 
        // the context of a transaction, so we’ll start with 
        // creating one.
        SqlTransaction transaction 
          = sqlConnection.BeginTransaction("mainTranaction");
        sqlCommand.Transaction = transaction;
 
        // The SQL gives us 3 values. First the PathName() method of 
        // the Document field is called, we’ll need it to use the API
        // Second we call a special function that will tell us what
        // the context is for the current transaction, in this case 
        // the "mainTransaction" we started above. Finally it gives
        // the name of the document, which the app will use when it
        // creates the document but is not strictly required as 
        // part of the FILESTREAM.
        sqlCommand.CommandText
          = "SELECT Document.PathName()"
          + ", GET_FILESTREAM_TRANSACTION_CONTEXT() "
          + ", DocumentName "
          + "FROM FileStreamFTS.dbo.DocumentRepository "
          + "WHERE ID=@theID ";
 
        sqlCommand.Parameters.Add(
          "@theID", SqlDbType.Int).Value = ID;
 
        SqlDataReader reader = sqlCommand.ExecuteReader();
        if (reader.Read() == false)
        {
          throw new Exception("Unable to get BLOB data");
        }
 
        // OK we have some data, pull it out of the reader into locals
        string path = (string)reader[0];
        byte[] context = (byte[])reader[1];
        string outputFilename = (string)reader[2];
        int length = context.Length;
        reader.Close();
 
        // Now we need to use the API we declared at the top of this class
        // in order to get a handle. 
        SafeFileHandle handle = OpenSqlFilestream(
          path
          , DESIRED_ACCESS_READ
          , SQL_FILESTREAM_OPEN_NO_FLAGS
          , context
          , (UInt32)length, 0);
 
        // Using the handle we just got, we can open up a stream from 
        // the database.
        FileStream databaseStream = new FileStream(
          handle, FileAccess.Read);
 
        // This file stream will be used to copy the data to disk
        FileStream outputStream 
          = File.Create(outputPath + outputFilename); 
 
        // Setup a buffer to hold the streamed data
        int blockSize = 1024 * 512;
        byte[] buffer = new byte[blockSize];
 
        // There are two ways we could get the data. The simplest way
        // is to read the data, then immediately feed it to the output
        // stream using it’s Write feature (shown below, commented out.
        // The second way is to load the data into an array of bytes
        // (here implemented using the generic LIST). This would let
        // you manipulate the data in memory, then write it out (as
        // shown here), reupload it to another data stream, or do
        // something else entirely. 
        // If you want to go the simple way, just remove all the
        // fileBytes lines and uncomment the outputStream line.
        List<byte> fileBytes = new List<byte>();
        int bytesRead = databaseStream.Read(buffer, 0, buffer.Length);
        while (bytesRead > 0)
        {
          bytesRead = databaseStream.Read(buffer, 0, buffer.Length);
          //outputStream.Write(buffer, 0, buffer.Length);
          foreach (byte b in buffer)
            fileBytes.Add(b);
        }
 
        // Write out what is in the LIST to disk
        foreach (byte b in fileBytes)
        {
          byte[] barr = new byte[1];
          barr[0] = b;
          outputStream.Write(barr, 0, 1);
        }
 
        // Close the stream from the databaseStream
        databaseStream.Close();
 
        // Write out the file
        outputStream.Close();
 
        // Finally we should commit the transaction. 
        sqlCommand.Transaction.Commit();
      }
      catch (System.Exception ex)
      {
        MessageBox.Show(ex.ToString());
      }
      finally
      {
        sqlConnection.Close();
      }
      return;
 
    }

The routine kicks off by opening a connection, then establishing a transaction. Remember from the previous lesson that every time you work with a FILESTREAM it has to be in a transaction. Next we basically duplicate the SQL used in the previous lesson, returning the path name, transaction context, and document name. The only difference is we pass in the ID as a parameter. With that, just like with the previous example we call the OpenSqlFilestream API. Note a difference, in this example the second parameter is “DESIRED_ACCESS_READ” as opposed to the write access we indicated previosly.

Once we have the “handle” we can create a FileStream for reading from the database. In this example I loop through the file stream, loading the data into a LIST of bytes. Once in memory we are free to work with it as we need to. In this example I simply loop back through the generic List and write the data to the file stream we opened on the disk for writing. If all you are doing is writing, it would be somewhat more efficient to write the code like so:

        int bytesRead = databaseStream.Read(buffer, 0, buffer.Length);
        while (bytesRead > 0)
        {
          bytesRead = databaseStream.Read(buffer, 0, buffer.Length);
          outputStream.Write(buffer, 0, buffer.Length);
        }
 
        // Close the stream from the databaseStream
        databaseStream.Close();

I simply eliminate the local byte array and write the buffer directly to the disk. Either way, the remainder is simple, just closing all the streams, commiting the transaction and closing the database connection.

This concludes the series on how to use FILESTREAM, in future posts we look into how to do Full Text Search with FILESTREAM stored objects.

Accessing FILESTREAM Data From A Client .NET Application – Part 1 Uploading a File

The best way to work with documents in a database is via a .Net application. I created a simple Windows forms project to access the table I created in previous lessons. I named the application FileLoader, you can the entire project at the Code Gallery site http://code.msdn.microsoft.com/FileStreamFTS .

The interface is very simple:

As you can see there are two main functions, the upper half uploads a file to the SQL Server. The lower half displays the files already in the table, lets the user pick one and then click the GetFile button to save it locally. Today we’ll look at the Upload File functionality. Here is the code:

    private void btnUploadFile_Click(object sender, EventArgs e)
    {
      // Reset in case it was used previously
      lblStatus.Text = "";
 
      // Make sure user entered something
      if (txtFile.Text.Length == 0)
      {
        MessageBox.Show("Must supply a file name");
        lblStatus.Text = "Must supply file name";
        return;
      }
 
      // Make sure what user entered is valid
      FileInfo fi = new FileInfo(txtFile.Text);
      if (!fi.Exists)
      {
        MessageBox.Show("The file you entered does not exist.");
        lblStatus.Text = "The file you entered does not exist.";
        return;
      }
 
      // Upload the file to the database
      FileTransfer.UploadFile(txtFile.Text);
 
      // Refresh the datagrid to show the newly added file
      LoadDataGridView();
 
      // Let user know it was uploaded
      lblStatus.Text = fi.Name + " Uploaded";
    }

The real line of importance is the FileTransfer.UploadFile. This calls a static method in a class I named FileTransfer.cs. In order to use FILESTREAM there is an API call we have to make, so at the header area of the FileTransfer we have a lot of declarations. These are pretty much a straight copy from the MSDN help files.

    //These contants are passed to the OpenSqlFilestream()
    //API DesiredAccess parameter. They define the type
    //of BLOB access that is needed by the application.
 
    const UInt32 DESIRED_ACCESS_READ = 0x00000000;
    const UInt32 DESIRED_ACCESS_WRITE = 0x00000001;
    const UInt32 DESIRED_ACCESS_READWRITE = 0x00000002;
 
    //These contants are passed to the OpenSqlFilestream()
    //API OpenOptions parameter. They allow you to specify
    //how the application will access the FILESTREAM BLOB
    //data. If you do not want this ability, you can pass in
    //the value 0. In this code sample, the value 0 has
    //been defined as SQL_FILESTREAM_OPEN_NO_FLAGS.
 
    const UInt32 SQL_FILESTREAM_OPEN_NO_FLAGS = 0x00000000;
    const UInt32 SQL_FILESTREAM_OPEN_FLAG_ASYNC = 0x00000001;
    const UInt32 SQL_FILESTREAM_OPEN_FLAG_NO_BUFFERING = 0x00000002;
    const UInt32 SQL_FILESTREAM_OPEN_FLAG_NO_WRITE_THROUGH = 0x00000004;
    const UInt32 SQL_FILESTREAM_OPEN_FLAG_SEQUENTIAL_SCAN = 0x00000008;
    const UInt32 SQL_FILESTREAM_OPEN_FLAG_RANDOM_ACCESS = 0x00000010;
 
    //This structure defines the format of the final parameter to the
    //OpenSqlFilestream() API.
 
    //This statement imports the OpenSqlFilestream API so that it
    //can be called in the Main() method below.
    [DllImport("sqlncli10.dll", SetLastError = true, CharSet = CharSet.Unicode)]
    static extern SafeFileHandle OpenSqlFilestream(
                string Filestreamath,
                uint DesiredAccess,
                uint OpenOptions,
                byte[] FilestreamTransactionContext,
                uint FilestreamTransactionContextLength,
                Int64 AllocationSize);
 
    //This statement imports the Win32 API GetLastError().
    //This is necessary to check whether OpenSqlFilestream
    //succeeded in returning a valid / handle
 
    [DllImport("kernel32.dll", SetLastError = true)]
    static extern UInt32 GetLastError();

OK, with that out of the way, I’ve created a public, static method to upload the file. Here is the full routine:

    public static void UploadFile(string fileName)
    {
      // Establish db connection
      SqlConnection sqlConnection = new SqlConnection(
                "Integrated Security=true;server=(local)");
      SqlTransaction transaction = null;
 
      // Create a File Info object so you can easily get the
      // name and extenstion. As an alternative you could
      // choose to pass them in,  or use some other way
      // to extract the extension and name. 
      FileInfo fi = new FileInfo(fileName);
 
      try
      {
        // Open the file as a stream
        FileStream sourceFile = new FileStream(fileName
          , FileMode.OpenOrCreate, FileAccess.Read);
 
        // Create the row in the database
        sqlConnection.Open();
 
        SqlCommand cmd = new SqlCommand();
        cmd.Connection = sqlConnection;
        cmd.CommandText = "INSERT INTO "
          + "FileStreamFTS.dbo.DocumentRepository"
          + "(DocumentExtension, DocumentName) VALUES (‘" 
          + fi.Extension + "’, ‘" 
          + fi.Name + "’)";
        cmd.ExecuteNonQuery();
 
        // Now upload the file. It must be done inside a transaction.
        transaction = sqlConnection.BeginTransaction("mainTranaction");
        cmd.Transaction = transaction;
        cmd.CommandText = "SELECT Document.PathName(), "
         + "GET_FILESTREAM_TRANSACTION_CONTEXT() "
         + "FROM FileStreamFTS.dbo.DocumentRepository "
         + "WHERE ID=(select max(id) from FileStreamFTS.dbo.DocumentRepository)";
        SqlDataReader rdr = cmd.ExecuteReader();
        if (rdr.Read() == false)
        {
          throw new Exception("Could not get file stream context");
        }
 
        // Get the path
        string path = (string)rdr[0];
        // Get a file stream context
        byte[] context = (byte[])rdr[1];
        int length = context.Length;
        rdr.Close();
 
        // Now use the API to get a reference (handle) to the filestream
        SafeFileHandle handle = OpenSqlFilestream(path
          , DESIRED_ACCESS_WRITE
          , SQL_FILESTREAM_OPEN_NO_FLAGS
          , context, (UInt32)length, 0);
 
        // Now create a true .Net filestream to the database
        // using the handle we got in the step above
        FileStream dbStream = new FileStream(handle, FileAccess.Write);
 
        // Setup a buffer to hold the data we read from disk
        int blocksize = 1024 * 512;
        byte[] buffer = new byte[blocksize];
 
        // Read from file and write to DB
        int bytesRead = sourceFile.Read(buffer, 0, buffer.Length);
        while (bytesRead > 0)
        {
          dbStream.Write(buffer, 0, buffer.Length);
          bytesRead = sourceFile.Read(buffer, 0, buffer.Length);
        }
 
        // Done reading, close all of our streams and commit the file
        dbStream.Close();
        sourceFile.Close();
        transaction.Commit();
 
      }
      catch (Exception e)
      {
        if (transaction != null)
        {
          transaction.Rollback();
        }
        throw e;
      }
      finally
      {
        sqlConnection.Close();
      }
 
    }

First we open a connection to the SQL Server, then create a FileInfo object to make it simple to extract the file name and extension. Next a record is inserted into the database that will act as a place holder. It has the name of the file and the extension, but no file yet. I did go ahead and open a FileStream to the source file, located on the disk. We’ll need this later to upload the file.

Next you will see that I begin a transaction. Every time you work with a FILESTREAM it must always be in the context of a transaction. After that a SQL Data Reader is created that has three pieces of information. First, it calls the PathName() function for the Document field in our table. The PathName() will be needed later when we call the API. The second field is returned from the GET_FILESTREAM_TRANSACTION_CONTEXT function, and returns the transaction context for the transaction. Note this is not the name (in this example “mainTransaction”), but the context which is a special value. These two values are then copied into local variables which will be used in calling the OpenSqlFilestream API. In this example I also retrieve the DocumentName field, this is used by the code when it writes the file to the database, but is not strictly needed for the FILESTREAM.

Next you will see the call to the OpenSqlFilestream API, which returns a “handle”. This handle is then used to create a FileStream object. Using this newly created FileStream (here named dbStream) we can then upload the file. Now the main work begins. After setting up a buffer, we then simply read from the source file stream into the buffer, then write the exact same buffer to the database FileStream. The loop continues until there are no more bytes in the source.

At this point we are essentially done. We close the streams, commit the transaction, and in the finally block close the SQL database connection. The file should now be in the database. I do want to point out one thing. In the SQL to get the information to the row just uploaded, I use a subquery to get the max(id), essentially returning the last row just inserted. This is fine for this simple example, when the database has just one user. In your production systems where you are likely to have many users, however, you should use an alternate method to return the row you need. Otherwise two users could insert rows at the same time, and thus a conflict could occur with both of them getting back the same max(id). It will not happen often, but at some point it could happen and be very hard to debug.

This handled the uploading of files to the SQL Server via FILESTREAM, in the next installment we’ll look at how to retrieve the file we just uploaded.

SQL Server Sample Data – The SQL Name Game

Like most folks, I seem to have a perpetual need for realistic test data. While there are many databases available, sometimes the need is quite simple. All I need is some names, perhaps dates and phone numbers that can be used for testing my applications, SSIS or SQL Server Reports. I decided to take care of this need once and for all, and set out with a simple goal. At the conclusion of my work I wanted to wind up with a realistic looking, but totally fake set of data. I wanted to do it in the simplest means possible, using whatever tools I had available. Finally, I wanted to do it as quickly as possible.

Along the way I documented my efforts, as well as created a sample table with 100,000 rows. When I started I thought to publish everything in a blog post, but it turned out to be far too much for a single blog post. Thus I decided to document everything in a white paper, and upload all the code to a MSDN Code Gallery site. Note that while I used the 2008 versions of SQL Server and Visual Studio, the SQL Scripts should run just fine with SQL Server 2005.

You can find everything at http://code.msdn.microsoft.com/SqlServerSampleData . Look in the downloads section for the complete PDF with all the details, as well as all of the sample data. Using the techniques outlined in the white paper you too could easily be generating your own test data for a wide variety of projects.