When importing data into PowerPivot, users often find the data is almost, but not quite what they need. Perhaps the name is not quite formatted as they need, or some calculation, not important in the stored data but very important to their work, is missing. For these situations PowerPivot offers Calculated Columns.
Calculated Columns provide a way for users to add that missing information they require into the source data. The calculations are done on a row by row basis, if you want to do something on the entire table, for example count the number of rows, you will instead need to create a measure in your PivotTable or PivotChart. Measures will be covered in a later post.
Let’s get started by using the same Excel 2010 workbook we ended with in the previous blog post. If you haven’t seen it, please go back and reference my post Combining Data from Multiple Sources in PowerPivot for the full details.
Our first task will be to address our customer names. In the source data, names are broken into five columns: Title, FirstName, MiddleName, LastName, and Suffix. For ease of use we wish to combine these distinct columns into one single column. Assuming you have opened the PowerPivot workbook, select the Customer table from the list of tabs at the bottom. Now go to the right-most column, ModifiedDate. Next to it you’ll see a blank column with the header “Add Column”. Click in it, then go up to the fx box right above the data.
The formula bar:
Into this formula bar we can create some fairly complex expressions. Let’s do one that shows some of the power of text formulas. Into the formula bar enter:
=[Title] & " " & [FirstName] & " " & IF(LEN([MiddleName]) > 0, [MiddleName] & " ", "") & [LastName] & IF(LEN([Suffix]) > 0, " " & [Suffix], "")
As with Excel, formulas need to begin with the equal sign. All literal string values are enclosed in double quote marks. Here we have two, a single space in the form of “ “ and an empty string in the form of “” (two double quotes right next to each other). The ampersand & character is used for concatenation. When using column names in the formulas, they must be enclosed in square brackets [ ] . Finally notice we’ve leveraged some standard Excel functions, first the LEN function which returns the length of the past in field. Then the IF function which evaluates the first statement (for example, LEN([MiddleName]) > 0 ). Then the area after the first comma ([MiddleName] & " " ) is returned if the statement was true, otherwise the area after the second comma ("" ) is returned.
After pressing enter on our formula PowerPivot will then calculate the values for each individual row in the dataset. The downside is this could take quite a while depending on the size of your data. A 100 million rows of data is going to take a while, even on a fast machine. The benefit though is this is the only time the calculation is done, unless of course the underlying data changes. Values are now calculated and available at analysis time.
You may notice the column name changes from Add Column to CalculatedColumn1. Since this is not something we’d want to show other users, or work with ourselves, simply right click on the column header, pick Rename Column, and give our new column a meaningful name. In this example I used FullName.
A quick side note, in the sample data the MiddleName and Suffix columns are not populated very often, as is often true with real data. However it can make browsing through our data a bit difficult. To validate our calculation, click the drop down menu triangle next to the MiddleName column, go to the bottom and uncheck the “Blank” option for data filtering. This will then remove all rows from the viewed data that are missing a middle name.
Note this doesn’t delete the rows, this is merely a filtering option in PowerPivot to help you view only the data you want. The other rows are still there, to prove it just click the menu arrow again and pick the “Clear filter from MiddleName” menu option and all rows will again be visible. For more information on filtering, see my post Import Filters in PowerPivot. The same filtering tools that apply to the data import process also work once the data is imported.
In addition to textual manipulation, PowerPivot also supports complex math calculations. Let’s do a simple example in the SalesOrderDetail tab. For simplicity, let’s decide that our base profit for any sale is 20 percent of the Line Total. However, for each item ordered we gain an extra 2 percent of profit. We’ll click in the Add Column area of the SalesOrderDetail tab and enter the following calculation:
=(.2 + ([OrderQty] * .02))*[LineTotal]
Now we can rename the column to EstimatedProfit using the rename menu option as described above.
We also have the power of the Excel math functions at our disposal. Let’s do something simple, and decide that we want to round the value of our EstimatedProfit column up to the next whole value. Even if the value was 1.01, it would round up to 2 dollars. To accomplish this we can use Excel’s ROUNDUP function:
=ROUNDUP((.2 + ([OrderQty] * .02))*[LineTotal], 0)
Yields these new results:
As you can see, they have indeed been rounded up to the next whole value. The 0 at the end of the formula indicated how many decimals should remain, I indicated none so we could see the results in whole dollars.
We’ve only just begun to explore the value in Calculated Columns. Not only can they fill in missing data, but they can also speed calculations when you reach the Pivot Table stage of your analysis by making aggregations much easier.