If you remember my Intro to PowerPivot post, you’ll recall that every organization has that power user who leveraged the tools he had on hand, mostly Excel, to do their own form of Business Intelligence. One headache that plagued “that guy” was the freshness of data. It could be quite difficult to keep the data used in his or her spreadsheets up to date. One of the great boons that PowerPivot offers is the ability to quickly and easily refresh data.
Once again let’s return to the Excel 2010 spreadsheet we created in my Combining Data from Multiple Sources in PowerPivot. In this example we’ll put ourselves in the role of a salesperson, creating an analysis for one of our best customers, one Lucy Harrington. Going to the Customer tab in our PowerPivot workbook, we click on the drop down menu for the FirstName column. Clear the “Select All” option then scroll down and only check the first name “Lucy” which should be sufficient to find only the customer we wish.
In looking at her record, we are horrified to see that some nit-wit in data entry had improperly given our best customer the title of “Mr.” even though we know Lucy is not only a “Mrs.” but as a newlywed very insistent on the use of the Mrs. title.
Correcting this is easy enough, in the source system we can go make the fix, or perhaps we bribe the DBA with lunch so he’ll make our data fix a high priority and issue this update command to the database:
UPDATE SalesLT.Customer SET Title = 'Mrs.' WHERE FirstName = 'Lucy'
So now how do we refresh our data in PowerPivot? Easy enough. In PowerPivot on the Home tab you’ll notice a Refresh button in the Connections group. Clicking on it gives us two options, one to refresh just the current table we are viewing, another to refresh all of the tables in our PowerPivot work area. For this example we’ll pick just the Refresh command.
Once you pick Refresh the Data Refresh dialog appears to let you know of its progress. When done it will let you know of its success and you can then click the Close button to close the window. At this point you’ll see some behavior that I can only attribute to the fact that as of this writing PowerPivot is still in the CTP stage. My screen flashed a few times, then when it was done I was on the last tab in my PowerPivot workbook, the CountryInfo tab. Simply click on the Customer tab to return.
A great thing about the Refresh is that it left our filters intact, you’ll see we are still only looking at Lucy’s, and even better we can see the title of our best customer is now corrected:
If you’re using the same sheet from the previous lessons, you will also notice the calculated column we’d put in place, FullName, is still present.
A word of caution.
When using the Refresh feature, be aware that every time you do a refresh you are pulling the entire dataset back, not just the changed rows. Using SQL Server Profile I monitored the Refresh operation from the server side, and found the query issues was:
SELECT [SalesLT].[Customer].* FROM [SalesLT].[Customer]
If you were to attempt this on a large table in your production system during a busy production time it could have severely negative consequences. Imagine the slowdown that would occur if this were executed against a system having hundreds of millions of rows.
When getting into data sets that large, I would advise moving towards a more structured data warehousing environment. This would provide a structure better suited for analysis. If nothing else though, get IT to setup a copy of the production system in a location updated during off peak hours so that analysts could pull massive amounts of data without impacting production systems.
Taking into account the strain on the source systems, the easy one button refresh does provide a great way for analysts to ensure the data they are making business decisions from is current and up to date.