Off and Running with PowerPivot for Excel 2010 at the Huntsville .Net Users Group

Tonight, April 19, I will be presenting at the Huntsville .Net Users Group. The subject will be PowerPivot for Excel 2010. Attendees will get to learn all about Microsoft’s new self-service Business Intelligence tool. Both the Excel and SharePoint pieces of PowerPivot will be discussed, plus the audience will get to see demonstrations of the new Excel add-on.

The slides in PDF format are available, PowerPivot.pdf

Spring seems to be a busy time for user groups, code camps, and SQL and SharePoint Saturdays. This Saturday, April 24 I will be speaking at the SQL Saturday in Atlanta Georgia. On May 1 I will be presenting at the Huntsville, Alabama SharePoint Saturday. The SQL Server event is already sold out, with a waiting list. There are still a few seats left for the share point Saturday event so hurry now before it gets sold out too!

Alabama Code Camp Mobile 2010

Last Saturday was the Alabama Code Camp, held in Mobile AL. For those unfamiliar with the Alabama Code Camps, we hold on average two a year, and they shift from city to city with different user groups acting as the host group. Other cities include Huntsville, Birmingham, and Montgomery. This time though the Lower Alabama Dot Net User Group under the leadership of Ryan Duclos hosted, and what a great event it was. Everything ran smoothly, there was plenty of drinks and pizza to go around, and some good swag to boot. A big congrats to Ryan and his team of volunteers for a great event, also thanks to Microsoft for sponsoring and the University of South Alabama for the venue.

I was kept busy at this code camp, doing three sessions. The first session was “Introduction to Microsoft PowerPivot”. The slide deck can be found at http://arcanecode.files.wordpress.com/2010/01/powerpivot_long.pdf. To see all my PowerPivot posts, simply pick it in the categories to the right or use this link: http://arcanecode.com/category/powerpivot/.

My second session was on Full Text Searching. You can find code samples and the PDF for the presentation at my code gallery site, http://code.msdn.microsoft.com/SqlServerFTS.

The final presentation was an introduction to Business Intelligence and Data Warehousing. Here is the link to the presentations slides in PDF format. As promised in the session I added the additional information for the Kimball Group book.

A quick apology for my delay in posting, a nasty head cold has had me in Zombie land since I got back. Thanks to all who attended, I appreciate you being very interactive, lots of questions, and very attentive. I look forward to the next time Mobile hosts the Alabama Code Camp.

PowerPivot for Excel 2010 at the Steel City SQL Users Group January 19, 2009

Today I am at the Steel City SQL Users Group, presenting on Microsoft’s newest BI Tool, PowerPivot.

The slide deck for this meeting can be found at:

http://arcanecode.files.wordpress.com/2010/01/powerpivot_long.pdf

To see all my posts so far on PowerPivot, you can use the link below to filter.

http://arcanecode.com/category/powerpivot/

Thanks for coming!

Refreshing PowerPivot Data

If you remember my Intro to PowerPivot post, you’ll recall that every organization has that power user who leveraged the tools he had on hand, mostly Excel, to do their own form of Business Intelligence. One headache that plagued “that guy” was the freshness of data. It could be quite difficult to keep the data used in his or her spreadsheets up to date. One of the great boons that PowerPivot offers is the ability to quickly and easily refresh data.

Once again let’s return to the Excel 2010 spreadsheet we created in my Combining Data from Multiple Sources in PowerPivot. In this example we’ll put ourselves in the role of a salesperson, creating an analysis for one of our best customers, one Lucy Harrington. Going to the Customer tab in our PowerPivot workbook, we click on the drop down menu for the FirstName column. Clear the “Select All” option then scroll down and only check the first name “Lucy” which should be sufficient to find only the customer we wish.

clip_image001

In looking at her record, we are horrified to see that some nit-wit in data entry had improperly given our best customer the title of “Mr.” even though we know Lucy is not only a “Mrs.” but as a newlywed very insistent on the use of the Mrs. title.

clip_image002

Correcting this is easy enough, in the source system we can go make the fix, or perhaps we bribe the DBA with lunch so he’ll make our data fix a high priority and issue this update command to the database:

UPDATE SalesLT.Customer 
   SET Title = 'Mrs.'
 WHERE FirstName = 'Lucy'

So now how do we refresh our data in PowerPivot? Easy enough. In PowerPivot on the Home tab you’ll notice a Refresh button in the Connections group. Clicking on it gives us two options, one to refresh just the current table we are viewing, another to refresh all of the tables in our PowerPivot work area. For this example we’ll pick just the Refresh command.

clip_image003

Once you pick Refresh the Data Refresh dialog appears to let you know of its progress. When done it will let you know of its success and you can then click the Close button to close the window. At this point you’ll see some behavior that I can only attribute to the fact that as of this writing PowerPivot is still in the CTP stage. My screen flashed a few times, then when it was done I was on the last tab in my PowerPivot workbook, the CountryInfo tab. Simply click on the Customer tab to return.

A great thing about the Refresh is that it left our filters intact, you’ll see we are still only looking at Lucy’s, and even better we can see the title of our best customer is now corrected:

clip_image004

If you’re using the same sheet from the previous lessons, you will also notice the calculated column we’d put in place, FullName, is still present.

clip_image005

A word of caution.

When using the Refresh feature, be aware that every time you do a refresh you are pulling the entire dataset back, not just the changed rows. Using SQL Server Profile I monitored the Refresh operation from the server side, and found the query issues was:

SELECT [SalesLT].[Customer].*
  FROM [SalesLT].[Customer]

If you were to attempt this on a large table in your production system during a busy production time it could have severely negative consequences. Imagine the slowdown that would occur if this were executed against a system having hundreds of millions of rows.

When getting into data sets that large, I would advise moving towards a more structured data warehousing environment. This would provide a structure better suited for analysis. If nothing else though, get IT to setup a copy of the production system in a location updated during off peak hours so that analysts could pull massive amounts of data without impacting production systems.

Summary

Taking into account the strain on the source systems, the easy one button refresh does provide a great way for analysts to ensure the data they are making business decisions from is current and up to date.

Lookups in PowerPivot Calculated Columns – Using the RELATED Function

In my previous post we looked at how Calculated Columns work in PowerPivot. One limitation you may have noticed though was all of the calculations were done using values in that individual table. What if you wanted to lookup a value in second table, based on a value in the first table, and return a value from that second table. Yes Virginia, not only is there a Santa Clause but there’s also an answer for us in PowerPivot’s RELATED function.

In addition to the standard Excel functions, PowerPivot provides a set of its own functions for working with its data. These new functions are collectively known as Data Analysis eXpressions, or DAX for short. By now you’ve probably guessed that the first function from the DAX toolbox you’ll want to learn is the RELATED function.

Let’s start with the same Excel 2010 workbook we had at the end of the lesson Combining Data from Multiple Sources in PowerPivot. If you recall we had imported data from the AdventureWorksLT2008 database. To that we added data the CountryInfo table, which we’d typed in to an Excel spreadsheet.

clip_image001

At the time we used this to get the CountryAbbr column, and you may have wondered why we also included a DiscountRate column. It’s this lesson where this DiscountRate will come in to play.

If you recall from that post, we used PowerPivot’s Manage Relationships feature to create a link that ultimately connected the SalesOrderDetail table to the CountryInfo table. This groundwork enables us to lookup values very easily. Here is a simple example. Go to the Add Column column of SalesOrderDetail and enter this formula into the fx area:

=RELATED(‘CountryInfo’[CountryAbbr])

When PowerPivot gets done, the abbreviation for each country to be shipped to appears in this column. This can be used to “flatten out” some of your data. However it’s much more useful when used as part of a calculation. Delete the column we just added (right click on the column header and pick Delete Column from the menu).

Looking in the CountryInfo table we see the DiscountRate. A value of 0.04 means our US customers get a discount of 4% off their LineTotal. So in our SalesOrderDetail table we want to take the LineTotal and calculate a new value based on the rate, which is stored in the CountryInfo table. Simple enough using the PowerPivot RELATED function.

=[LineTotal] * (1 – RELATED(‘CountryInfo’[DiscountRate]))

This is fairly simple math, we take the DiscountRate from CountryInfo (for US, 0.04) and subtract from 1 giving us 0.96, or 96%. This is then multiplied by the LineTotal giving us our new discounted LineTotal amount, which I renamed to DiscountedLineTotal.

clip_image002

Hopefully these simple examples will give you a glimpse of the immense power RELATED offers. It will allow you to combine data from multiple tables to create in-depth analysis that previously required a specialist in Business Intelligence to create.

Calculated Columns in PowerPivot

When importing data into PowerPivot, users often find the data is almost, but not quite what they need. Perhaps the name is not quite formatted as they need, or some calculation, not important in the stored data but very important to their work, is missing. For these situations PowerPivot offers Calculated Columns.

Calculated Columns provide a way for users to add that missing information they require into the source data. The calculations are done on a row by row basis, if you want to do something on the entire table, for example count the number of rows, you will instead need to create a measure in your PivotTable or PivotChart. Measures will be covered in a later post.

Let’s get started by using the same Excel 2010 workbook we ended with in the previous blog post. If you haven’t seen it, please go back and reference my post Combining Data from Multiple Sources in PowerPivot for the full details.

Our first task will be to address our customer names. In the source data, names are broken into five columns: Title, FirstName, MiddleName, LastName, and Suffix. For ease of use we wish to combine these distinct columns into one single column. Assuming you have opened the PowerPivot workbook, select the Customer table from the list of tabs at the bottom. Now go to the right-most column, ModifiedDate. Next to it you’ll see a blank column with the header “Add Column”. Click in it, then go up to the fx box right above the data.

clip_image001[6]

The formula bar:

clip_image002[6]

Into this formula bar we can create some fairly complex expressions. Let’s do one that shows some of the power of text formulas. Into the formula bar enter:

=[Title] & " " & [FirstName] & " " & IF(LEN([MiddleName]) > 0, [MiddleName] & " ", "") & [LastName] & IF(LEN([Suffix]) > 0, " " & [Suffix], "")

As with Excel, formulas need to begin with the equal sign. All literal string values are enclosed in double quote marks. Here we have two, a single space in the form of “ “ and an empty string in the form of “” (two double quotes right next to each other). The ampersand & character is used for concatenation. When using column names in the formulas, they must be enclosed in square brackets [ ] . Finally notice we’ve leveraged some standard Excel functions, first the LEN function which returns the length of the past in field. Then the IF function which evaluates the first statement (for example, LEN([MiddleName]) > 0 ). Then the area after the first comma ([MiddleName] & " " ) is returned if the statement was true, otherwise the area after the second comma ("" ) is returned.

After pressing enter on our formula PowerPivot will then calculate the values for each individual row in the dataset. The downside is this could take quite a while depending on the size of your data. A 100 million rows of data is going to take a while, even on a fast machine. The benefit though is this is the only time the calculation is done, unless of course the underlying data changes. Values are now calculated and available at analysis time.

You may notice the column name changes from Add Column to CalculatedColumn1. Since this is not something we’d want to show other users, or work with ourselves, simply right click on the column header, pick Rename Column, and give our new column a meaningful name. In this example I used FullName.

clip_image003[6]

A quick side note, in the sample data the MiddleName and Suffix columns are not populated very often, as is often true with real data. However it can make browsing through our data a bit difficult. To validate our calculation, click the drop down menu triangle next to the MiddleName column, go to the bottom and uncheck the “Blank” option for data filtering. This will then remove all rows from the viewed data that are missing a middle name.

Note this doesn’t delete the rows, this is merely a filtering option in PowerPivot to help you view only the data you want. The other rows are still there, to prove it just click the menu arrow again and pick the “Clear filter from MiddleName” menu option and all rows will again be visible. For more information on filtering, see my post Import Filters in PowerPivot. The same filtering tools that apply to the data import process also work once the data is imported.

In addition to textual manipulation, PowerPivot also supports complex math calculations. Let’s do a simple example in the SalesOrderDetail tab. For simplicity, let’s decide that our base profit for any sale is 20 percent of the Line Total. However, for each item ordered we gain an extra 2 percent of profit. We’ll click in the Add Column area of the SalesOrderDetail tab and enter the following calculation:

=(.2 + ([OrderQty] * .02))*[LineTotal]

Now we can rename the column to EstimatedProfit using the rename menu option as described above.

clip_image004[6]

We also have the power of the Excel math functions at our disposal. Let’s do something simple, and decide that we want to round the value of our EstimatedProfit column up to the next whole value. Even if the value was 1.01, it would round up to 2 dollars. To accomplish this we can use Excel’s ROUNDUP function:

=ROUNDUP((.2 + ([OrderQty] * .02))*[LineTotal], 0)

Yields these new results:

clip_image005[6]

As you can see, they have indeed been rounded up to the next whole value. The 0 at the end of the formula indicated how many decimals should remain, I indicated none so we could see the results in whole dollars.

We’ve only just begun to explore the value in Calculated Columns. Not only can they fill in missing data, but they can also speed calculations when you reach the Pivot Table stage of your analysis by making aggregations much easier.

Combining Data from Multiple Data Sources in PowerPivot

Seldom does the user of PowerPivot have all of the data they need in one nice, neat data source. More than often it will be necessary to import data from a variety of sources and make that data work together. It’s time to start building on what we’ve learned over the last few days to accomplish this feat.

First, launch Excel 2010 and use the PowerPivot import wizard to import the following tables from the AdventureWorksLT2008 database: Address, Customer, CustomerAddress, Product, ProductCategory, SalesOrderDetail, SalesOrderHeader. (Note, for a refresher on importing data please see my blog post, Import Filters in PowerPivot.)

Now we need a second source of data. Follow the instructions in my post Creating Tables in PowerPivot to enter the data below into Excel 2010, copy and paste it into a new PowerPivot table.

clip_image001

If you recall when we import data from a relational database, PowerPivot examines the foreign key relationships found in the database to create relationships between the tables it imports. In this situation though, the CountryInfo data didn’t get imported from a database, instead it was pasted in from a manually entered spreadsheet. Thus, PowerPivot has no information with which it can implicitly create a relationship.

We do want to create one however, so we can link the longer country name in the Address table to the CountryInfo data and thus be able to use the briefer country abbreviations. As PowerPivot was designed to work with many sources of data, it has an easy way to create these relationships.

In the PowerPivot window, click on the Table tab at the very top. All the way to the right you will notice a button group named Relationships. Click the Create Relationship button.

clip_image002

As the above dialog shows, this allows you to create a relationship, or a link between two tables in PowerPivot. Here we are creating a link between the Address table and the CountryInfo table on the CountryRegion field. When complete just click Create to create the relationship.

If you want to verify the relationship was indeed created, or review any of the relationships PowerPivot inferred when it imported the tables from the AdventureWorksLT2008 database, just click the Manage Relationships button in the Table Toolbar’s Relationships group.

clip_image003

On the very first row you’ll see the newly created relationship between the Address and CountryInfo tables. You’ll also see the other relationships that were created during the import process from the SQL Server database. The three buttons at the top let us Create new relationships, Edit existing ones, or Delete ones no longer needed. Note that the altering or deleting of relationships has no effect what so ever on the original source data (SQL Server or the Excel 2010 spreadsheet). It only affects the tables as stored in PowerPivot.

Now let’s see the new relationship in action. Close the Manage Relationships window, and on the PowerPivot Home tab create a new PowerPivot table (Pivot Table, Single Pivot Table). Go ahead and put it in a new worksheet.

In the Gemini Task Pane, go to the SalesOrderHeader table and drag the LineTotal field into the Values area. Next, drag the Name field from the Product table into the Row Labels area. Now for the magic, in the CountryInfo table drag the CountryAbbr field into the Column Labels area. Your pivot table should look something like this:

clip_image004

Because of the relationships that were inferred or that we created, PowerPivot was able to link the data like so:

1. SalesOrderDetail linked to SalesOrderHeader on the SalesOrderID column.

2. SalesOrderHeader linked to Address on the ShipToAddressID=AddressID.

3. Address linked to ContryInfo on the CountryRegion column.

To validate this for yourself, just return to PowerPivot and look at the Manage Relationships dialog to see all the links.

The need to combine data from many sources is a common task, one that will most certainly be done by users of PowerPivot. Using the techniques shown here, you can create and manage the relationships that will link data from these disparate sources together and leverage the power of PowerPivot.

Creating Tables in PowerPivot

PowerPivot has the ability to import data from a wide variety of sources. But you could run across a situation where you don’t have that data stored anywhere. Perhaps it’s on a piece of paper, or in a text file, or it’s just in the user’s brain and needs to be typed in. Logically then you would want to create a new table in PowerPivot.

Except you can’t. PowerPivot itself doesn’t provide the ability to create tables and enter data directly into it. Now, before you start the usual rending of garments and gnashing of teeth plus a little wailing, there is a simple to implement solution.

Create a new Excel 2010 workbook. In sheet 1 (or any sheet) let’s enter the following information.

clip_image001

Now highlight the above cells and Copy them to the clipboard. Next, launch the PowerPivot window by going to the PowerPivot tab in Excel 2010 and clicking the PowerPivot window button.

Once PowerPivot is open, if you look in the middle group of buttons you’ll see a set named Paste from Clipboard The To New Table button should be activated now that you have data in your clipboard.

clip_image002

Click the To New Table button. When you do, the Paste Preview dialog appears.

clip_image003

This is similar to the preview window you see with the Import Table wizard, only not quite as much functionality. Here, we can view the data and validate that it is correct, which it is. We can also indicate if the first row contains our column headers, which in our case it does so we can just leave that option checked on. Click OK to import the data.

clip_image004

Above is our new data, now pasted into PowerPivot. We have the same abilities with it we have with any other table, we can sort, rename our columns, add new calculated columns, and more. As you will note from the tab at the bottom of the picture, the data was pasted into a table with the rather uninformative name of Table. We can do better than that, so right click on the Table tab and pick Rename from the menu. Overwrite Table with CountryInfo.

clip_image005

Now you can see how easy it is to create new data from scratch and paste it into PowerPivot. In this example I used a limited number of rows for illustrative purposes, but it’s quite possible to import massive amounts of data. In addition, you can add to your table later. In this example all we would have had to do is Paste Append from the toolbar.

In the next blog post we’ll build on what we’ve learned and look at how to combine data imported from multiple sources.

Import Filters in PowerPivot

PowerPivot has the ability to import millions of rows of data into Excel 2010 for purposes of analyzing, slicing and dicing. However, even though you can import vast amounts of data you may not always want to. There are many reasons for this.

You may wish to limit the amount of data for security reasons. Perhaps bringing in all of the data may put an unneeded strain on the server. Most likely though, is you simply do not need all of the data. As an analyst you may be interested in only the data for a segment of your organization, such as a single plant or department.

Fortunately filtering data is an easy task. If you are not familiar with importing data into PowerPivot, may I suggest you first review my step by step posted yesterday.

Launch Excel 2010 and go to the PowerPivot tab, then launch the PowerPivot window. Select the "From Database" to start the import process, and for this example we’ll use SQL Server. Now, like in my previous blog post enter your credentials to connect to a SQL Server. For this example we’ll just pick one table, SalesOrderDetail from the AdventureWorksLT2008 database.

With the SalesOrderDetail picked, look in the lower right side of the dialog. You will see a button labeled "Preview & Filter".

clip_image001

When you click that button you should see a new dialog appear.

clip_image002

Here you can see a subset of your data, only the first few rows. It’s enough to give you an idea of what the data looks like, but it won’t show you every single row. Considering the fact that PowerPivot is capable of importing millions of rows, this is probably a good thing.

Within this dialog we can do some pretty powerful things. Let’s start by eliminating a column we don’t need in the data we want to import. Scroll to the right until you see the rowguid column and uncheck the box next to the column header, as you see below.

clip_image003

By unchecking this box this column will not be imported into our Excel 2010 PowerPivot table.

Recall though that the name of this dialog was Preview & Filter. We can also do some review of our data to ensure it’s what we want. Let’s say we want to look to see the range of values for our line totals. Click the downward facing triangle button to the right of the LineTotal column header. A drop down menu will appear. Select "Sort Smallest to Largest"

clip_image004

Scrolling through the data you’ll notice that the data has indeed been sorted. Also note the menu icon to the right of the column name changes to indicate a sort has been applied to this column. (Remember though it’s not showing all rows, just the first few sorted in order.) An important thing to note tough, this sorting applies only to the data as you see it in the Preview & Filter area, once you click the Finish button on the Table Import Wizard the sort is removed. While the preview options in this dialog are not saved, filters are. If you had pressed Finished, you’d have seen that rowguid is not included in the result set. But don’t click Finish yet, we’re not quite done filtering.

For our next filter, let’s decide for purposes of this report we are only interested in large orders. We’ll define large as "Line Total greater than 1,000 dollars". Once again open the menu to the right of the LineTotal column header and select "Number Filters". A pop out menu will appear, from it select "Greater Than…"

clip_image005

When the Custom Filter pops up, enter 1000 next to the is greater than box and click OK.

clip_image006

Clicking OK will reveal lesser amounts have been removed. Other filters besides numeric filters are available. For text data, you have "is equal to" and "is not equal to" available. Date filters work the same, having the "is equal to" and "is not equal to" available. I’m hopeful that for date types further functionality will be added in the future, such as date ranges or "is greater/less than" types of functionality.

Let’s click the OK button on the Preview & Filters window to return to the Table Import Wizard. You’ll now see an Applied filters link in the row with the table name, click it to see what filters are in effect.

clip_image007

When you click the link you’ll see:

clip_image008

While you cannot edit this information, it is nice to see it all in one location. Click OK to close and return to the previous window.

Click Finish to complete the import. You will see the data you asked for, only rows with LineTotal greater than 1000 and without the rowguid column.

Hopefully you’ve seen how powerful the filtering tools included with PowerPivot are. Using them you can remove unwanted rows and columns, limiting not only the amount of data you have to pull across the network but that you have to store locally in the PowerPivot Excel 2010 spreadsheet. Limiting your data will ensure only the rows required for the analysis are included, saving time and enhancing security.

Follow

Get every new post delivered to your Inbox.

Join 93 other followers