Introduction to DW/BI–My Newest Old Course at Pluralsight

Way back in November of 2010, I published my second course with Pluralsight, Introduction to Data Warehousing and Business Intelligence. Over that time a lot has transpired. While the basic fundamentals haven’t changed, a lot of technology has. In addition, Microsoft’s WideWorldImporters sample provides a great platform for demonstrating many of the concepts.

Additionally, I have to admit that after eight years the slides and accompanying graphics have started to look a bit dated. As a result, Pluralsight asked me to update the course.

I’m happy to announce the new and improved version of Introduction to Data Warehousing and Business Intelligence is now live at Pluralsight!

In case the title didn’t give it away, in this course I cover all the basics of data warehousing and business intelligence. What are facts and dimensions, and what do they actually look like in the database? How should you design a data warehouse?

All new to this course, you’ll see the various tools  from Microsoft to do DW/BI, using the all new Wide World Importers sample database as our platform. You’ll see how to design a database using SSDT, and see how the concepts of DW/BI were implemented.

You’ll also see how SSIS was used to achieve ETL, as well as SSAS to create an analytic cube to do BI. Multiple tools are then explored for reporting; SSRS, Mobile Report Publisher, Excel, and PowerBI.

So, who is this course for?

As a DBA, you’ll be asked to implement new data warehouse projects. The design of a data warehouse is very different from a traditional database, and it’s important to understand these differences. In addition, you’ll be asked to install and configure the data warehouse tools. We’ll talk about the tools included with SQL Server, such as SQL Server Integration Services, Analysis Services, and Reporting Services.

If you are a database designer or developer, you’ll be asked to design these data warehouses. As we said, the design is VERY different from a traditional database. You need to understand these differences in order to implement a data warehouse.

Software developers are interacting with data warehouses on an increasing basis, in order to get additional data for their programs. In the past developers had to create custom interfaces to other systems in order to get additional information needed for their applications. In today’s world developers are being told to get that required data from the central repository of the data warehouse.

Finally, this course is valuable for project managers and business users who seek to understand the systems they can pull data from in order to do self service reporting. By the end of the course users will understand what terms like facts and dimensions mean, and how to effectively use them in your reporting.

If you aren’t already a Pluralsight customer, but are interested, just shoot me an email: free @ arcanetc.com. I can hook you up with a code which will give you 30 days of access to their site, during which you can watch my courses, or any of the courses on Pluralsight. S

Advertisement

Dealing with the Date Dimension Deployment Dilemma

For those of you who have routines that load up a date dimension, you know that it can be a little slow to run. There are times when I am deploying my database project (hey, you are using SSDT SQL Server Database Tools SQL Server Projects aren’t you???) and I want to recreate the database from scratch. While it is a great opportunity to grab another cup of coffee, tea, or read through a fantastic blog like this one, after doing this a few times you wind up with a bad case of caffeine jitters.

I had a date dimension to load today that was particularly challenging. They legitimately needed a couple hundred years worth of dates, plus there were some special calculated columns that needed to be populated. All total it took about 25 minutes to populate, on a reasonably decent sized server. Ouch. So I came up with an alternate solution that I thought I’d share with you, my adoring public (at least is seems like it from the spam comments that appear as love letters lol).

I created a second database project as part of my same solution. You can name it something generic, like “DateData”, or if you are using a date routine specific to each project, you can name it after your project database with something like “Dates” or “DateData” at the end.

This new project has two files. One is the create table script to create the only table, DimDate. The second is a post deployment script, in which I just copied over the original date dimension population script from the original project. I was able to deploy this, waiting the 25 minutes or so for it to populate my date dimension.

Back in the main database project, I deleted the code in the DimDate population post deployment script, and simply put in an INSERT statement to insert the rows in the DateData’s DimDate into the main projects DimDate. Load time was maybe 50 seconds, although it may have been shorter as I looked away for a moment.

The beauty of this is I only had to do a real load of DimDate once, since the date dimension isn’t likely to change during development, or generally at any point. Once your project is complete and on-line, you can even delete the DateData database. Should you need it again you can simply recreate it from the DateData project that was part of your solution. I can now do a full drop and create deployment (aka Publish in SSDT) as often as I want without risking caffeine poisoning or being condemned to staying awake all night doing a Phineas and Ferb marathon (which actually sounds kinda fun, now I wish I hadn’t fixed this issue. Oh well.)

Updating AdventureWorksDW2012 for Today

Like many of my fellow MVPs and Presenters, I use the Adventure Works sample data from Microsoft to do my presentations. Being a BI guy, I specifically use the AdventureWorksDW2012 version, the Data Warehouse of Adventure Works. I think you’d agree though it’s gotten a little long in the tooth. All of dates range from 2005 to 2008. This is especially irritating when demonstrating features reliant on the current date ( think GETDATE() or NOW() ).

Before you read further, let me stress again this is NOT for the typical AdventureWorks2012 database. This script is for the Data Warehouse version, AdventureWorksDW2012.

I scoured the search engines but couldn’t find anyone who had taken time to come up with a way to update the database. Finally fed up, I did it myself. Below is a script which will add five years to each date in AdventureWorksDW2012. 2008 becomes 2013, 2007 becomes 2012, and so on. The script, below, turned out to be pretty simple.

Before you begin though, a few prerequisites. First, you will need to have AdventureWorksDW2012 installed on your system. A friend and co-worker, Bradley Ball (@SQLBalls | blog ) pointed out one issue which I’ll pass along. He had some issues with the version of AdventureWorksDW2012 located at http://msftdbprodsamples.codeplex.com/releases/view/55330. When he just grabbed the mdf file and tried to create the database using the attach_rebuild_log option it came out corrupted. Instead he suggested the version stored at http://www.wrox.com/WileyCDA/Section/Wrox-Books-Using-the-SQL-Server-2012-RTM-Database-Examples-Download.id-811144.html?DW_1118479580.zip. (I don’t think Wrox will mind, as I and many of my co-workers have written books for them, nice folks.)

Next, please note this script was written with SQL Server 2012 in mind. It could easily be adapted for 2008R2 by tweaking a few paths. Speaking of which, I use the default paths for everything, you’ll need to alter if you used other paths.

Not wanting to mess with the original AdventureWorksDW2012, in Step 1 (these steps are numbered in the script below) I make a backup of the existing 2012 version. I then do a restore, renaming it to AdventureWorksDW2013. Be warned, if you have run this before and AdventureWorksDW2013 exists it will be deleted. This might be good if you want an easy way to reset your 2013 version, if not alter the script for your needs.

Later I will be inserting dates. I have a handy little routine that converts a traditional datetime data type to an integer, using the traditional YYYYMMDD common for data warehouse date keys. I probably could have done this using some version of FORMAT but I already had the routine written so I just grabbed and reused it. Note it also does some bounds checking, etc that really wasn’t needed here, but like I said I did a grab and reuse. So in Step 2 I create the function.

In step 3 I tackle the biggest task of inserting new rows into the date dimension. The DimDate table already had dates through the end of 2010, so I only had to generate 2011-2013. Inside a WHILE loop I iterate over each date individually, do the calculations to break out the various pieces of a date such as month number, quarter number, etc, and do an INSERT into the DimDate table. If you recall, the DimDate table in AdventureWorks has mult-language versions of the month and day names. I simply read the existing ones into table variables, then in the SELECT part of the INSERT INTO… SELECT statement do a join to these two table variables.

Of course to do that, I had to have a table to select from. None of my date data though existed in the table, each piece of data was generated from the CurrentDate variable. So I simply created a third table variable named BogusTable, and inserted a single row in it. This gave me something to join the month and day name tables to. I suppose I could have used CASE statements for each of the names, but this was more fun.

With the dates added to DimDate, it was time to move on to the Fact tables. In some cases it was very simple. For example, in Step 4.1 I just add 50,000 to the date key. Why 50,000? Simple date math. The dates are integers, 20080101 is really 20,080,101. To bring it up to 2013, I simply added 50,000, thus 20,013,101 or 20130101.

The two Sales fact tables had dates on leap year from 2008. To fix those I simply backed those up a day, shifting them to February 28th. I took a slightly different approach with the Currency Rate fact table, simply shifting the 2008 leap year to 2012 leap year, then omitting February 29th from the rest of the update. Also note that on this and the Product Inventory table, the Date Key was actually part of the Primary Key of the tables. Thus I had to first drop the Primary Key, make the changes to the dates, then recreate the Primary Key.

One last note on the Fact tables, all of the dates in the Call Center table were set to 2010. For those I merely added 30,000, shifting them from 2010 to 2013. (Don’t ask me why those have 2010 dates when the rest of the sample data is 2005-2008. I have not a clue.)

As a last and final step, Step 5, I drop the little helper function DateToDateId I created way back in Step 2. And that’s it! You now have a handy demo / practice database with dates that are actually current.

A big thanks to my co-workers at Pragmatic Works (@PragmaticWorks | http://pragmaticworks.com ) for helping me test this out and making sure it worked with their stuff.

Enjoy!

 

PS Most browsers don’t seem to render the code in a monospace font. Be assured when you paste into SSMS everything should line back up again, assuming of course you use a monospace font in SSMS.

 

/*-----------------------------------------------------------------------------------------------*/
/* Updating AdventureWorks2012 for Today */
/* */
/* Robert C. Cain, http://arcanecode.com @ArcaneCode */
/* */
/* Script Copyright (c) 2013 by Robert C. Cain */
/* AdventureWorks database Copyright (c) Microsoft. */
/* */
/* This script will make a backup of the AdventureWorks2012DW database, then copy and restore it */
/* as AdventureWorksDW2013. It will then update it for current dates. 2008 now becomes 2013, */
/* 2007 is now 2012, and so forth. This script is dependent on the AdventureWorks2012DW sample */
/* database already being installed. It won't change AdventureWorksDW2012 in anyway. */
/* */
/* Be warned, if AdventureWorksDW2013 exists, it will be deleted as part of this process. */
/* */
/*-----------------------------------------------------------------------------------------------*/

PRINT 'Updating AdventureWorks2012 for Today - Starting'
GO

/*-----------------------------------------------------------------------------------------------*/
/* Step 1 - Make a copy of AdventureWorksDW2012 and restore as AdventureWorksDW2013 */
/*-----------------------------------------------------------------------------------------------*/
SET NOCOUNT ON

USE [master]

-- Step 1.1. Make a backup of AdventureWorksDW2012 ----------------------------------------------
PRINT 'Backing up AdventureWorksDW2012'
GO

BACKUP DATABASE [AdventureWorksDW2012]
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\AdventureWorksDW2012.bak'
WITH NOFORMAT,
INIT,
NAME = N'AdventureWorksDW2012-Full Database Backup',
SKIP,
NOREWIND,
NOUNLOAD,
STATS = 10
GO


-- Step 1.2. Delete the database AdventureWorksDW2013 if it exists ------------------------------
PRINT 'Deleting AdventureWorksDW2013, if it exists'
GO

IF (EXISTS (SELECT 1
FROM master.dbo.sysdatabases
WHERE name = 'AdventureWorksDW2013' )
)
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'AdventureWorksDW2013'
GO

IF (EXISTS (SELECT 1
FROM master.dbo.sysdatabases
WHERE name = 'AdventureWorksDW2013' )
)
DROP DATABASE [AdventureWorksDW2013]
GO

-- Step 1.3. Restore the database to a new copy -------------------------------------------------
PRINT 'Restoring AdventureWorksDW2012 to AdventureWorksDW2013'
GO

RESTORE DATABASE [AdventureWorksDW2013]
FROM DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\AdventureWorksDW2012.bak'
WITH FILE = 1,
MOVE N'AdventureWorksDW2012_Data'
TO N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\AdventureWorksDW2013_Data.mdf',
MOVE N'AdventureWorksDW2012_Log'
TO N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\AdventureWorksDW2013_log.ldf',
NOUNLOAD, STATS = 5

GO

PRINT 'Done Creating AdventureWorksDW2013'
GO



/*-----------------------------------------------------------------------------------------------*/
/* Step 2. Create a helper function to convert dates to a YYYYMMDD format Date Id. */
/*-----------------------------------------------------------------------------------------------*/

USE [AdventureWorksDW2013]
GO

IF EXISTS (SELECT [name] FROM [sys].[all_objects] WHERE [name] = 'DateToDateId')
DROP FUNCTION [dbo].[DateToDateId];
GO

CREATE FUNCTION [dbo].[DateToDateId]
(
@Date DATETIME
)
RETURNS INT
AS
BEGIN

DECLARE @DateId AS INT
DECLARE @TodayId AS INT

SET @TodayId = YEAR(GETDATE()) * 10000
+ MONTH(GETDATE()) * 100
+ DAY(GETDATE())

-- If the date is missing, or a placeholder for a missing date, set to the Id for missing dates
-- Else convert the date to an integer
IF @Date IS NULL OR @Date = '1900-01-01' OR @Date = -1
SET @DateId = -1
ELSE
BEGIN
SET @DateId = YEAR(@Date) * 10000
+ MONTH(@Date) * 100
+ DAY(@Date)
END

-- If there's any data prior to 2000 it was incorrectly entered, mark it as missing
IF @DateId BETWEEN 0 AND 19991231
SET @DateId = -1

-- Commented out for this project as future dates are OK
-- If the date is in the future, don't allow it, change to missing
-- IF @DateId > @TodayId
-- SET @DateId = -1

RETURN @DateId

END

GO




/*-----------------------------------------------------------------------------------------------*/
/* Step 3. Add new dates to the dbo.DimDate table. */
/*-----------------------------------------------------------------------------------------------*/
PRINT 'Adding new dates to dbo.DimDate'
GO

SET NOCOUNT ON

-- Later we will be writing an INSERT INTO... SELECT FROM to insert the new record. I want to
-- join the day and month name memory variable tables, but need to have something to join to.
-- Since everything is calculated, we'll just create this little bogus table to have something
-- to select from.
DECLARE @BogusTable TABLE
( PK TINYINT)

INSERT INTO @BogusTable SELECT 1;


-- Create a table variable to hold the days of the week with their various language versions
DECLARE @DayNameTable TABLE
( [DayNumberOFWeek] TINYINT
, [EnglishDayNameOfWeek] NVARCHAR(10)
, [SpanishDayNameOfWeek] NVARCHAR(10)
, [FrenchDayNameOfWeek] NVARCHAR(10)
)

INSERT INTO @DayNameTable
SELECT DISTINCT
[DayNumberOFWeek]
, [EnglishDayNameOfWeek]
, [SpanishDayNameOfWeek]
, [FrenchDayNameOfWeek]
FROM dbo.DimDate

-- Create a month table to hold the months and their language versions.
DECLARE @MonthNameTable TABLE
( [MonthNumberOfYear] TINYINT
, [EnglishMonthName] NVARCHAR(10)
, [SpanishMonthName] NVARCHAR(10)
, [FrenchMonthName] NVARCHAR(10)
)

INSERT INTO @MonthNameTable
SELECT DISTINCT
[MonthNumberOfYear]
, [EnglishMonthName]
, [SpanishMonthName]
, [FrenchMonthName]
FROM dbo.DimDate

-- This is the start and end date ranges to use to populate the
-- dbo.DimDate dimension. Change if it's 2014 and you run across this script.
DECLARE @FromDate AS DATE = '2011-01-01'
DECLARE @ThruDate AS DATE = '2013-12-31'

-- CurrentDate will be incremented each time through the loop below.
DECLARE @CurrentDate AS DATE
SET @CurrentDate = @FromDate

-- FiscalDate will be set six months into the future from the CurrentDate
DECLARE @FiscalDate AS DATE

-- Now we simply loop over every date between the From and Thru, inserting the
-- calculated values into DimDate.
WHILE @CurrentDate <= @ThruDate
BEGIN

SET @FiscalDate = DATEADD(m, 6, @CurrentDate)

INSERT INTO dbo.DimDate
SELECT [dbo].[DateToDateId](@CurrentDate)
, @CurrentDate
, DATEPART(dw, @CurrentDate) AS DayNumberOFWeek
, d.EnglishDayNameOfWeek
, d.SpanishDayNameOfWeek
, d.FrenchDayNameOfWeek
, DAY(@CurrentDate) AS DayNumberOfMonth
, DATEPART(dy, @CurrentDate) AS DayNumberOfYear
, DATEPART(wk, @CurrentDate) AS WeekNumberOfYear
, m.EnglishMonthName
, m.SpanishMonthName
, m.FrenchMonthName
, MONTH(@CurrentDate) AS MonthNumberOfYear
, DATEPART(q, @CurrentDate) AS CalendarQuarter
, YEAR(@CurrentDate) AS CalendarYear
, IIF(MONTH(@CurrentDate) < 7, 1, 2) AS CalendarSemester
, DATEPART(q, @FiscalDate) AS FiscalQuarter
, YEAR(@FiscalDate) AS FiscalYear
, IIF(MONTH(@FiscalDate) < 7, 1, 2) AS FiscalSemester
FROM @BogusTable
JOIN @DayNameTable d
ON DATEPART(dw, @CurrentDate) = d.[DayNumberOFWeek]
JOIN @MonthNameTable m
ON MONTH(@CurrentDate) = m.MonthNumberOfYear

SET @CurrentDate = DATEADD(d, 1, @CurrentDate)
END
GO

-- If you want to verify you can uncomment this line.
-- SELECT * FROM dbo.DimDate WHERE DateKey > 20110000

PRINT 'Done adding new dates to dbo.DimDate'
GO





/*-----------------------------------------------------------------------------------------------*/
/* Step 4. Update the Fact Tables with the new dates. */
/*-----------------------------------------------------------------------------------------------*/


PRINT 'Update Fact Tables'
GO

SET NOCOUNT ON

-- To move forward five years, we simply add 50,000 to the date key

-- 4.1 FactFinance ------------------------------------------------------------------------------
PRINT ' FactFinance'
GO

UPDATE [dbo].[FactFinance]
SET [DateKey] = [DateKey] + 50000;


-- 4.2 FactInternetSales ------------------------------------------------------------------------
PRINT ' FactInternetSales'
GO

-- There are a few rows where the due date is on leap year. Update these to back off a day
-- so the date add works OK
UPDATE [dbo].[FactInternetSales]
SET [OrderDateKey] = 20080228
, [OrderDate] = '2008-02-28'
WHERE [OrderDateKey] = 20080229

UPDATE [dbo].[FactInternetSales]
SET [DueDateKey] = 20080228
, [DueDate] = '2008-02-28'
WHERE [DueDateKey] = 20080229

UPDATE [dbo].[FactInternetSales]
SET [ShipDateKey] = 20080228
, [ShipDate] = '2008-02-28'
WHERE [ShipDateKey] = 20080229

-- Now update the rest of the days.
UPDATE [dbo].[FactInternetSales]
SET [OrderDateKey] = [OrderDateKey] + 50000
, [DueDateKey] = [DueDateKey] + 50000
, [ShipDateKey] = [ShipDateKey] + 50000
, [OrderDate] = DATEADD(yy, 5, [OrderDate])
, [DueDate] = DATEADD(yy, 5, [DueDate])
, [ShipDate] = DATEADD(yy, 5, [ShipDate])


-- 4.3 FactResellerSales ------------------------------------------------------------------------
PRINT ' FactResellerSales'
GO

-- As with Internet Sales, there are rows where the due date is on leap year.
-- Update these to back off a day so the date add works OK
UPDATE [dbo].[FactResellerSales]
SET [OrderDateKey] = 20080228
, [OrderDate] = '2008-02-28'
WHERE [OrderDateKey] = 20080229

UPDATE [dbo].[FactResellerSales]
SET [DueDateKey] = 20080228
, [DueDate] = '2008-02-28'
WHERE [DueDateKey] = 20080229

UPDATE [dbo].[FactResellerSales]
SET [ShipDateKey] = 20080228
, [ShipDate] = '2008-02-28'
WHERE [ShipDateKey] = 20080229

-- Now update the table
UPDATE [dbo].[FactResellerSales]
SET [OrderDateKey] = [OrderDateKey] + 50000
, [DueDateKey] = [DueDateKey] + 50000
, [ShipDateKey] = [ShipDateKey] + 50000
, [OrderDate] = DATEADD(yy, 5, [OrderDate])
, [DueDate] = DATEADD(yy, 5, [DueDate])
, [ShipDate] = DATEADD(yy, 5, [ShipDate])

-- 4.4 FactSalesQuota ---------------------------------------------------------------------------
PRINT ' FactSalesQuota'
GO

UPDATE [dbo].[FactSalesQuota]
SET [DateKey] = [DateKey] + 50000

-- 4.5 FactSurveyResponse -----------------------------------------------------------------------
PRINT ' FactSurveyResponse'
GO

UPDATE [dbo].[FactSurveyResponse]
SET [DateKey] = [DateKey] + 50000

-- 4.6 FactCallCenter ---------------------------------------------------------------------------
PRINT ' FactCallCenter'
GO

-- All the rows in call center have a 2010 date, just add 3 years to make these 2013
UPDATE [dbo].[FactCallCenter]
SET [DateKey] = [DateKey] + 30000


-- 4.7 FactCurrencyRate -------------------------------------------------------------------------
PRINT ' FactCurrencyRate'
GO

-- Because the DateKey is part of the PK, we have to drop the key before we can update it
ALTER TABLE [dbo].[FactCurrencyRate] DROP CONSTRAINT [PK_FactCurrencyRate_CurrencyKey_DateKey]
GO

-- Shift the 2008 Leap Year days to 2012 Leap Year
UPDATE [dbo].[FactCurrencyRate]
SET [DateKey] = 20120229
WHERE [DateKey] = 20080229

-- Update everything except the leap year we fixed already
UPDATE [dbo].[FactCurrencyRate]
SET [DateKey] = [DateKey] + 50000
WHERE [DateKey] <> 20120229

-- Add the PK back
ALTER TABLE [dbo].[FactCurrencyRate]
ADD CONSTRAINT [PK_FactCurrencyRate_CurrencyKey_DateKey] PRIMARY KEY CLUSTERED
( [CurrencyKey] ASC,
[DateKey] ASC
)
WITH ( PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, IGNORE_DUP_KEY = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO


-- 4.8 FactProductInventory ---------------------------------------------------------------------
PRINT ' FactProductInventory'
GO

-- As with the previous step, the date is part of the primary key, so we need to drop it first.
ALTER TABLE [dbo].[FactProductInventory] DROP CONSTRAINT [PK_FactProductInventory]
GO

-- Shift the 2008 Leap Year days to 2012 Leap Year
UPDATE [dbo].[FactProductInventory]
SET [DateKey] = 20120229
WHERE [DateKey] = 20080229

-- Update everything except the leap year we fixed already
UPDATE [dbo].[FactProductInventory]
SET [DateKey] = [DateKey] + 50000
WHERE [DateKey] <> 20120229

-- Add the PK back
ALTER TABLE [dbo].[FactProductInventory]
ADD CONSTRAINT [PK_FactProductInventory] PRIMARY KEY CLUSTERED
( [ProductKey] ASC
, [DateKey] ASC
)
WITH ( PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, IGNORE_DUP_KEY = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO

PRINT 'Done updating the Fact tables'
GO



/*-----------------------------------------------------------------------------------------------*/
/* Step 5. Cleanup, remove the helper function we added earlier. */
/*-----------------------------------------------------------------------------------------------*/
PRINT 'Removing Helper Function'
GO

IF EXISTS (SELECT 1 FROM [sys].[all_objects] WHERE [name] = 'DateToDateId')
DROP FUNCTION [dbo].[DateToDateId];
GO

/*-----------------------------------------------------------------------------------------------*/
/* All done! */
/*-----------------------------------------------------------------------------------------------*/
PRINT 'Updating AdventureWorks2012 for Today - Completed'
GO

SSIS tip for Lookup Transformations

The Lookup Transformation is a cornerstone of almost any SSIS package. The vast majority of packages that load Fact tables use Lookups extensively. Many of these lookups reference the same tables over and over.

As we all (hopefully) know by now, opting to lookup against a table is akin to doing a SELECT *. Best practices guide you to only select the columns you really need for a lookup, typically the surrogate key and business key. Because I often reference the same tables over and over, I’ve taken to keeping all my frequently referenced tables in a single SQL file.

I typically store all my projects in a “Projects” folder off my root directory (aka C:\). Under it I create a folder for each project I work on. Within there I create a folder simply called SQL, where I store some SQL scripts. Some are just temporary as I work through issues, or as in this case a good place to store my commonly used lookups. It will wind up looking something like:

-- Employee SELECT DimEmployeeId, EmployeeBusinessKey FROM DimEmployee -- Company SELECT DimCompanyId, CompanyBusinessKey FROM DimCompany -- Office SELECT DimOfficeId, OfficeBusinessKey FROM DimOffice -- More here

That’s a very generic example, but you get the idea. Simple, but very handy.

Run As Robert

Recently I was a guest on Richard Campbell’s podcast “RunAs Radio”. It was a lot of fun, we talked about Business Intelligence as it’s evolved and what its future might be with technologies such as Hadoop up and coming. You can find the interview at:

http://www.runasradio.com/default.aspx?showNum=307 

One minor correction, in my bio he misread my “MCTS” certification as an MCT. Just wanted to clarify that point since I’m not an MCT (yet, LOL).

Creating a Data Warehouse Date Id in Task Factory Advanced Derived Column Transformation

The company I work for, Pragmatic Works, makes a great tool called Task Factory. It’s a set of transformations that plug into SQL Server Integration Services and provides a wealth of new controls you can use in your packages. One of these is the Advanced Derived Column Transformation. If you are familiar with the regular Derived Column transformation built into SSIS, you know that it can be painful to use if you have to create anything other than a very basic calculation. Every try typing something complex into that single row tiny little box? Egad.

The Task Factory Advanced Derived Column transform allows you to pop up a dialog and have true multi-line editing. In addition there are 180 addition functions to make your life easier. Which is actually the point of this whole post.

As a Business Intelligence developer, one of the things I have to do almost daily is convert a date data type to an integer. Most dates (at least in the US) are in Month / Day / Year format. Overseas the format is usually Day / Month / Year (which to me makes more sense). SQL Server Analysis Services loves integer based field, so a common practice is to store dates as an integer in YYYYMMDD format.

Converting a date to an integer using the derived column transform can be ugly. Here’s an example of a fairly common (although not the only) way to do it:


(DT_I4)((DT_WSTR,4)YEAR(MyDateColumn) + RIGHT("00" + (DT_WSTR,2)MONTH(MyDateColumn),2) + RIGHT("00" + (DT_WSTR,2)DAY(MyDateColumn),2))

Task Factory makes this much easier. There is a ToChar function which converts columns or values to characters. This function allows you to pass in a format to convert to. Wrap all that in a ToInteger function and away you go. Check this out:


ToInteger(ToChar(MyDateColumn, "yyyyMMdd"))

Much, much simpler. One thing, the case of the format is very important. It must be yyyyMMdd, otherwise it won’t work. If you want to extend this more, you can actually check for a null, and if it is null return a –1 (a common Id for a missing row) or another special integer to indicate a missing value, such as 19000101.


IIf(IsNull(MyDateColumn)
   , -1
   , ToInteger(ToChar(MyDateColumn, "yyyyMMdd"))
   )

Here we first check to see if the column is null, if so we return the missing value, else we return the date converted integer. And yes, you can do multi line code inside the Advanced Derived Column Transformation.

As you can see the Advanced Derived Column Transformation makes working with dates much, much easier than the standard derived column transformation. This is such a common need that, at the risk of sounding like an ad, I decided to blog about it so I can share this with all my clients in the future.

(Just to be clear, it’s not an ad, I was not asked to do this, nor did I receive any money for it. Mostly I did this post just so I could share the syntax when I start each project or training class.)

I’m Speaking! SQL Saturday Nashville and PowerShell Saturday Atlanta

Just wanted to let folks know I’ll be doing presentations at two upcoming events.

The first is SQL Saturday #145 in Nashville. That’s this weekend, October 13th. I’ll be giving my “Introduction to Data Warehousing / Business Intelligence” presentation. Here is the slide deck I’ll be using: introtodatawarehousing.pdf

My second presentation will be October 27 in Atlanta at PowerShell Saturday #003. Yep, the PowerShell guys are taking the Saturday concept and kicking off a series of PowerShell Saturdays. This is only the third, but I see many more coming in the future.

At PowerShell Saturday I’ll be presenting “Make SQL Server Pop with PowerShell”. I’ll cover both the SMO and SQL Provider during this session.

Looks like it’ll be a busy October, but I’d hurry as both events are filling up so don’t wait and get registered now!

Intro to DW/BI at SQL Saturday #167 Columbus GA

Today, September 8th 2012, I’m presenting at SQL Saturday #167 in Columbus, GA. My slides for this presentation can be found at:

https://arcanecode.files.wordpress.com/2012/08/introtodatawarehousing.pdf

The shortcuts from the slides are:

 

Thanks for attending!

devLink 2012–Intro to Data Warehousing / Business Intelligence

Today I’m presenting at devLink 2012. My slides for this presentation can be found at:

https://arcanecode.files.wordpress.com/2012/08/introtodatawarehousing.pdf

The shortcuts from the slides are:

Atlanta Code Camp 2012

It’s time again for the Atlanta Code Camp! Still a few hours left to register, you can do so at http://www.atlantacodecamp.com/. Why would you want to come? Well for one to see me, I’ll be giving two presentations.

The first is right after lunch, The Decoder Ring to DW/BI (Data Warehousing / Business Intelligence). In this talk I’ll walk through all the concepts behind designing a data warehouse, including some real world examples to help you understand the differences between Facts, Dimensions, Surrogate Keys, and more.

You may think a BI talk is an odd one for a developer oriented day. More and more though developers are being directed toward the data warehouse to get information for their applications, to combine with the system they are designing. Understanding how data warehouses work will give you a leg up when your company establishes it’s own data warehouse.

The slides for my presentations can be downloaded HERE.

My second session is “Become a PowerShell Pop Star”. In this very fast session we’ll start at ground zero with PowerShell, Microsoft’s scripting language. You’ll see all about cmdlets, variables, functions, programming and more.

SharePoint BI (Business Intelligence) Training Resources

SharePoint is a huge topic unto itself, so I wanted to provide some links that focus on learning how to use SharePoint in the context of Business Intelligence. 

A quick disclaimer, some of the links below are by co-workers or other people I have an affiliation with, financial or otherwise. That’s because I’m lucky enough to work with some of the best people in the field. Also, in the case of the books I’ve linked to the Kindle version where possible, mostly because I’m a Kindle junkie. There are paper versions of the books, and you are free to buy from your favorite retailer.

Books

SharePoint 2010 Business Intelligence 24 hour Trainer – This is a really cool book, in that it’s not just a book. It comes with a DVD loaded with video lessons on how to use the various tools in SharePoint for doing BI. This is the first book to buy if you are new to doing BI in SharePoint.

Business Intelligence in Microsoft SharePoint 2010 – This is a great book which provides an introduction to all of the BI Services available within SharePoint 2010.

Microsoft SharePoint 2010 Business Intelligence Unleashed – Like most of the books in the “Unleashed” series, this takes a much deeper dive into the tools than the previous two books. A good choice once you are ready to move beyond the beginner stage.

Blogs

There aren’t many blogs that focus solely on SharePoint for BI, so I’ve picked out two sites that have a lot of SharePoint for BI content on them.

Data Inspirations -  This blog focuses on all aspects of BI, one of the lead bloggers is Stacia Misner, co-author of the BI in Microsoft SharePoint 2010 book listed above.

BIDN – Many experts in the BI field, myself included, contribute to the Business Intelligence Developer Network. Here you’ll find a wide variety of topics in the BI field, including many on SharePoint BI.

Videos

Pragmatic Works Webinars – On our website we have a big catalog of past webinars (all of which are free to watch), many of which focus on PowerPivot.

Pluralsight – Pluralsight has an extensive catalog of other courses you can pick from. It’s subscription bases so there is a modest fee (starts at $29 US per month last I checked) but well worth it for the training you can get. There’s also a free trial.

For a quick link direct to this post, you can use http://bit.ly/arcanespbi

PowerPivot Training Resources

I’ve been asked to provide links to some useful resources for learning about PowerPivot. Below are a list of my favorite blogs, books, and other sites to learn from.

A quick disclaimer, some of the links below are by co-workers or other people I have an affiliation with, financial or otherwise. That’s because I’m lucky enough to work with some of the best people in the field. Also, in the case of the books I’ve linked to the Kindle version where possible, mostly because I’m a Kindle junkie. There are paper versions of the books, and you are free to buy from your favorite retailer.

Books

Professional Microsoft PowerPivot for Excel and SharePoint – This book covers all aspects of PowerPivot, from using it to installing it to configuration. Everything you want to know in one volume. I wouldn’t particularly call this a beginners book however, it assumes you are competent in BI, Excel, and SharePoint.

Microsoft PowerPivot for Excel 2010: Give Your Data Meaning – This is a good first book to get, it goes deeply into the use of PowerPivot within Excel.

Practical PowerPivot and DAX Formulas for Excel 2010 – Once you are comfortable with PowerPivot, you’ll want to learn more about DAX, Data Analysis eXpressions, the set of functions used to do advanced calculations and aggregations in PowerPivot. I’m a big fan of the way the author, Art Tennick does his books. It’s the Problem – Solution approach, where he demonstrates a common problem then shows one or more ways to solve it. Art also has books on MDX and DMX you should check out.

Blogs

PowerPivot Pro – Probably one of the best blogs around, Rob and Kasper provide excellent content.

Denny Lee’s Blog – Denny works for Microsoft and is part of the PowerPivot team. He provides some really great insights, and is co-author of the Professional Microsoft PowerPivot for Excel and SharePoint book above.

PowerPivot Info – Not so much a blog but a blog aggregator, this site brings the best PowerPivot content on the web to the forefront.

Videos

Pragmatic Works Webinars – On our website we have a big catalog of past webinars (all of which are free to watch), many of which focus on PowerPivot.

Pluralsight – A little shameless self promotion here. I did a complete course on PowerPivot for Pluralsight. This includes both using PowerPivot from Excel and managing PowerPivot within SharePoint. In addition, Pluralsight has an extensive catalog of other courses you can pick from. It’s subscription bases so there is a modest fee (starts at $29 US per month last I checked) but well worth it for the training you can get. There’s also a free trial.

For a quick link direct to this post, you can use http://bit.ly/arcanepivot

SSRS Training Resources

I’ve been asked to provide links to some useful resources for learning about SQL Server Reporting Services. Below are a list of my favorite blogs, books, and other sites to learn from.

A quick disclaimer, some of the links below are by co-workers or other people I have an affiliation with, financial or otherwise. That’s because I’m lucky enough to work with some of the best people in the field. Also, in the case of the books I’ve linked to the Kindle version where possible, mostly because I’m a Kindle junkie. There are paper versions of the books, and you are free to buy from your favorite retailer.

Books

Microsoft SQL Server 2008 Reporting Services Step by Step – A great beginner book, loaded with good examples.

Pro SQL Server 2008 Reporting Services – This book goes much more in-depth with SSRS, delves into many advanced topics.

Microsoft SQL Server Reporting Services Recipes – 2008 or 2012 version of book. This is a great book, especially if you are doing Business Intelligence reporting. Note Amazon says the 2008 version is no longer available in the US, but I’m betting you can find it in your local bookstore or from other retailers. The 2012 version is available for pre-order.

Applied Microsoft SQL Server 2008 Reporting Services – Great book, like the book above covers many aspects of SSRS including BI reporting. Note Amazon only sells the paper version, you can also get it in PDF format direct from the publishers website.

Professional SQL Server 2012 Administration – I mention this book because I wrote the chapter on SQL Server Reporting Services. I don’t go deep into creating reports, although I briefly cover Report Builder. I do go into configuring SSRS and how to do scale out deployments, the total chapter is around 50 pages.

Blogs

Paul Turley – Paul is an active blogger and fellow Microsoft MVP. He is also co-author of the Reporting Services Recipes book I listed above.

Tep Lachev – An active blogger, Teo is not only a good resource for SSRS but for other BI tools such as PowerPivot. He is also author of the Applied Microsoft SQL Server 2008 Reporting Services book, listed above.

Videos

Pragmatic Works Webinars – On our website we have a big catalog of past webinars (all of which are free to watch), many of which focus on SSRS.

Pluralsight – Pluralsight has an extensive catalog of courses, including some great SSRS content. It’s subscription bases so there is a modest fee (starts at $29 US per month last I checked) but well worth it for the training you can get. There’s also a free trial.

For a quick link direct to this post, you can use http://bit.ly/arcanessrs

PowerPivot with Pluralsight

For the last few months I’ve been cooped up here in the Arcane dungeon, working on a new project. And now that’s it’s released I can finally share all the details.

While I’m still happily employed during the days as a BI Architect for Comframe, in my spare time I’ve been acting as a technical advisor to a training company called Pluralsight.

I’ve developed a series of four training modules on PowerPivot for Excel 2010. The first module is an overview, that gives you the basic steps on creating a pivot table using PowerPivot.

Module 2 goes in-depth on working with data behind the scenes. How to add calculations, formatting data, filtering data and more.

In the third module I cover the creation and formatting of Pivot Tables, formatting them for users, using DAX (Data Analysis eXpressions), and how to use Sparklines with PowerPivot data.

In the last module we explore the charting capabilities of PowerPivot, covering various types of charts and ways to format charts. We then use the knowledge we’ve gained over the last four modules to create a realistic dashboard.

If you are already a subscriber to Pluralsight, take a look, the courses are now live. If you aren’t, take a look, Pluralsight has the best training I’ve seen on the net and it’s growing daily.