Updating AdventureWorksDW2012 for Today

Like many of my fellow MVPs and Presenters, I use the Adventure Works sample data from Microsoft to do my presentations. Being a BI guy, I specifically use the AdventureWorksDW2012 version, the Data Warehouse of Adventure Works. I think you’d agree though it’s gotten a little long in the tooth. All of dates range from 2005 to 2008. This is especially irritating when demonstrating features reliant on the current date ( think GETDATE() or NOW() ).

Before you read further, let me stress again this is NOT for the typical AdventureWorks2012 database. This script is for the Data Warehouse version, AdventureWorksDW2012.

I scoured the search engines but couldn’t find anyone who had taken time to come up with a way to update the database. Finally fed up, I did it myself. Below is a script which will add five years to each date in AdventureWorksDW2012. 2008 becomes 2013, 2007 becomes 2012, and so on. The script, below, turned out to be pretty simple.

Before you begin though, a few prerequisites. First, you will need to have AdventureWorksDW2012 installed on your system. A friend and co-worker, Bradley Ball (@SQLBalls | blog ) pointed out one issue which I’ll pass along. He had some issues with the version of AdventureWorksDW2012 located at http://msftdbprodsamples.codeplex.com/releases/view/55330. When he just grabbed the mdf file and tried to create the database using the attach_rebuild_log option it came out corrupted. Instead he suggested the version stored at http://www.wrox.com/WileyCDA/Section/Wrox-Books-Using-the-SQL-Server-2012-RTM-Database-Examples-Download.id-811144.html?DW_1118479580.zip. (I don’t think Wrox will mind, as I and many of my co-workers have written books for them, nice folks.)

Next, please note this script was written with SQL Server 2012 in mind. It could easily be adapted for 2008R2 by tweaking a few paths. Speaking of which, I use the default paths for everything, you’ll need to alter if you used other paths.

Not wanting to mess with the original AdventureWorksDW2012, in Step 1 (these steps are numbered in the script below) I make a backup of the existing 2012 version. I then do a restore, renaming it to AdventureWorksDW2013. Be warned, if you have run this before and AdventureWorksDW2013 exists it will be deleted. This might be good if you want an easy way to reset your 2013 version, if not alter the script for your needs.

Later I will be inserting dates. I have a handy little routine that converts a traditional datetime data type to an integer, using the traditional YYYYMMDD common for data warehouse date keys. I probably could have done this using some version of FORMAT but I already had the routine written so I just grabbed and reused it. Note it also does some bounds checking, etc that really wasn’t needed here, but like I said I did a grab and reuse. So in Step 2 I create the function.

In step 3 I tackle the biggest task of inserting new rows into the date dimension. The DimDate table already had dates through the end of 2010, so I only had to generate 2011-2013. Inside a WHILE loop I iterate over each date individually, do the calculations to break out the various pieces of a date such as month number, quarter number, etc, and do an INSERT into the DimDate table. If you recall, the DimDate table in AdventureWorks has mult-language versions of the month and day names. I simply read the existing ones into table variables, then in the SELECT part of the INSERT INTO… SELECT statement do a join to these two table variables.

Of course to do that, I had to have a table to select from. None of my date data though existed in the table, each piece of data was generated from the CurrentDate variable. So I simply created a third table variable named BogusTable, and inserted a single row in it. This gave me something to join the month and day name tables to. I suppose I could have used CASE statements for each of the names, but this was more fun.

With the dates added to DimDate, it was time to move on to the Fact tables. In some cases it was very simple. For example, in Step 4.1 I just add 50,000 to the date key. Why 50,000? Simple date math. The dates are integers, 20080101 is really 20,080,101. To bring it up to 2013, I simply added 50,000, thus 20,013,101 or 20130101.

The two Sales fact tables had dates on leap year from 2008. To fix those I simply backed those up a day, shifting them to February 28th. I took a slightly different approach with the Currency Rate fact table, simply shifting the 2008 leap year to 2012 leap year, then omitting February 29th from the rest of the update. Also note that on this and the Product Inventory table, the Date Key was actually part of the Primary Key of the tables. Thus I had to first drop the Primary Key, make the changes to the dates, then recreate the Primary Key.

One last note on the Fact tables, all of the dates in the Call Center table were set to 2010. For those I merely added 30,000, shifting them from 2010 to 2013. (Don’t ask me why those have 2010 dates when the rest of the sample data is 2005-2008. I have not a clue.)

As a last and final step, Step 5, I drop the little helper function DateToDateId I created way back in Step 2. And that’s it! You now have a handy demo / practice database with dates that are actually current.

A big thanks to my co-workers at Pragmatic Works (@PragmaticWorks | http://pragmaticworks.com ) for helping me test this out and making sure it worked with their stuff.

Enjoy!

 

PS Most browsers don’t seem to render the code in a monospace font. Be assured when you paste into SSMS everything should line back up again, assuming of course you use a monospace font in SSMS.

 

/*-----------------------------------------------------------------------------------------------*/
/* Updating AdventureWorks2012 for Today */
/* */
/* Robert C. Cain, http://arcanecode.com @ArcaneCode */
/* */
/* Script Copyright (c) 2013 by Robert C. Cain */
/* AdventureWorks database Copyright (c) Microsoft. */
/* */
/* This script will make a backup of the AdventureWorks2012DW database, then copy and restore it */
/* as AdventureWorksDW2013. It will then update it for current dates. 2008 now becomes 2013, */
/* 2007 is now 2012, and so forth. This script is dependent on the AdventureWorks2012DW sample */
/* database already being installed. It won't change AdventureWorksDW2012 in anyway. */
/* */
/* Be warned, if AdventureWorksDW2013 exists, it will be deleted as part of this process. */
/* */
/*-----------------------------------------------------------------------------------------------*/

PRINT 'Updating AdventureWorks2012 for Today - Starting'
GO

/*-----------------------------------------------------------------------------------------------*/
/* Step 1 - Make a copy of AdventureWorksDW2012 and restore as AdventureWorksDW2013 */
/*-----------------------------------------------------------------------------------------------*/
SET NOCOUNT ON

USE [master]

-- Step 1.1. Make a backup of AdventureWorksDW2012 ----------------------------------------------
PRINT 'Backing up AdventureWorksDW2012'
GO

BACKUP DATABASE [AdventureWorksDW2012]
TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\AdventureWorksDW2012.bak'
WITH NOFORMAT,
INIT,
NAME = N'AdventureWorksDW2012-Full Database Backup',
SKIP,
NOREWIND,
NOUNLOAD,
STATS = 10
GO


-- Step 1.2. Delete the database AdventureWorksDW2013 if it exists ------------------------------
PRINT 'Deleting AdventureWorksDW2013, if it exists'
GO

IF (EXISTS (SELECT 1
FROM master.dbo.sysdatabases
WHERE name = 'AdventureWorksDW2013' )
)
EXEC msdb.dbo.sp_delete_database_backuphistory @database_name = N'AdventureWorksDW2013'
GO

IF (EXISTS (SELECT 1
FROM master.dbo.sysdatabases
WHERE name = 'AdventureWorksDW2013' )
)
DROP DATABASE [AdventureWorksDW2013]
GO

-- Step 1.3. Restore the database to a new copy -------------------------------------------------
PRINT 'Restoring AdventureWorksDW2012 to AdventureWorksDW2013'
GO

RESTORE DATABASE [AdventureWorksDW2013]
FROM DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\AdventureWorksDW2012.bak'
WITH FILE = 1,
MOVE N'AdventureWorksDW2012_Data'
TO N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\AdventureWorksDW2013_Data.mdf',
MOVE N'AdventureWorksDW2012_Log'
TO N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\AdventureWorksDW2013_log.ldf',
NOUNLOAD, STATS = 5

GO

PRINT 'Done Creating AdventureWorksDW2013'
GO



/*-----------------------------------------------------------------------------------------------*/
/* Step 2. Create a helper function to convert dates to a YYYYMMDD format Date Id. */
/*-----------------------------------------------------------------------------------------------*/

USE [AdventureWorksDW2013]
GO

IF EXISTS (SELECT [name] FROM [sys].[all_objects] WHERE [name] = 'DateToDateId')
DROP FUNCTION [dbo].[DateToDateId];
GO

CREATE FUNCTION [dbo].[DateToDateId]
(
@Date DATETIME
)
RETURNS INT
AS
BEGIN

DECLARE @DateId AS INT
DECLARE @TodayId AS INT

SET @TodayId = YEAR(GETDATE()) * 10000
+ MONTH(GETDATE()) * 100
+ DAY(GETDATE())

-- If the date is missing, or a placeholder for a missing date, set to the Id for missing dates
-- Else convert the date to an integer
IF @Date IS NULL OR @Date = '1900-01-01' OR @Date = -1
SET @DateId = -1
ELSE
BEGIN
SET @DateId = YEAR(@Date) * 10000
+ MONTH(@Date) * 100
+ DAY(@Date)
END

-- If there's any data prior to 2000 it was incorrectly entered, mark it as missing
IF @DateId BETWEEN 0 AND 19991231
SET @DateId = -1

-- Commented out for this project as future dates are OK
-- If the date is in the future, don't allow it, change to missing
-- IF @DateId > @TodayId
-- SET @DateId = -1

RETURN @DateId

END

GO




/*-----------------------------------------------------------------------------------------------*/
/* Step 3. Add new dates to the dbo.DimDate table. */
/*-----------------------------------------------------------------------------------------------*/
PRINT 'Adding new dates to dbo.DimDate'
GO

SET NOCOUNT ON

-- Later we will be writing an INSERT INTO... SELECT FROM to insert the new record. I want to
-- join the day and month name memory variable tables, but need to have something to join to.
-- Since everything is calculated, we'll just create this little bogus table to have something
-- to select from.
DECLARE @BogusTable TABLE
( PK TINYINT)

INSERT INTO @BogusTable SELECT 1;


-- Create a table variable to hold the days of the week with their various language versions
DECLARE @DayNameTable TABLE
( [DayNumberOFWeek] TINYINT
, [EnglishDayNameOfWeek] NVARCHAR(10)
, [SpanishDayNameOfWeek] NVARCHAR(10)
, [FrenchDayNameOfWeek] NVARCHAR(10)
)

INSERT INTO @DayNameTable
SELECT DISTINCT
[DayNumberOFWeek]
, [EnglishDayNameOfWeek]
, [SpanishDayNameOfWeek]
, [FrenchDayNameOfWeek]
FROM dbo.DimDate

-- Create a month table to hold the months and their language versions.
DECLARE @MonthNameTable TABLE
( [MonthNumberOfYear] TINYINT
, [EnglishMonthName] NVARCHAR(10)
, [SpanishMonthName] NVARCHAR(10)
, [FrenchMonthName] NVARCHAR(10)
)

INSERT INTO @MonthNameTable
SELECT DISTINCT
[MonthNumberOfYear]
, [EnglishMonthName]
, [SpanishMonthName]
, [FrenchMonthName]
FROM dbo.DimDate

-- This is the start and end date ranges to use to populate the
-- dbo.DimDate dimension. Change if it's 2014 and you run across this script.
DECLARE @FromDate AS DATE = '2011-01-01'
DECLARE @ThruDate AS DATE = '2013-12-31'

-- CurrentDate will be incremented each time through the loop below.
DECLARE @CurrentDate AS DATE
SET @CurrentDate = @FromDate

-- FiscalDate will be set six months into the future from the CurrentDate
DECLARE @FiscalDate AS DATE

-- Now we simply loop over every date between the From and Thru, inserting the
-- calculated values into DimDate.
WHILE @CurrentDate <= @ThruDate
BEGIN

SET @FiscalDate = DATEADD(m, 6, @CurrentDate)

INSERT INTO dbo.DimDate
SELECT [dbo].[DateToDateId](@CurrentDate)
, @CurrentDate
, DATEPART(dw, @CurrentDate) AS DayNumberOFWeek
, d.EnglishDayNameOfWeek
, d.SpanishDayNameOfWeek
, d.FrenchDayNameOfWeek
, DAY(@CurrentDate) AS DayNumberOfMonth
, DATEPART(dy, @CurrentDate) AS DayNumberOfYear
, DATEPART(wk, @CurrentDate) AS WeekNumberOfYear
, m.EnglishMonthName
, m.SpanishMonthName
, m.FrenchMonthName
, MONTH(@CurrentDate) AS MonthNumberOfYear
, DATEPART(q, @CurrentDate) AS CalendarQuarter
, YEAR(@CurrentDate) AS CalendarYear
, IIF(MONTH(@CurrentDate) < 7, 1, 2) AS CalendarSemester
, DATEPART(q, @FiscalDate) AS FiscalQuarter
, YEAR(@FiscalDate) AS FiscalYear
, IIF(MONTH(@FiscalDate) < 7, 1, 2) AS FiscalSemester
FROM @BogusTable
JOIN @DayNameTable d
ON DATEPART(dw, @CurrentDate) = d.[DayNumberOFWeek]
JOIN @MonthNameTable m
ON MONTH(@CurrentDate) = m.MonthNumberOfYear

SET @CurrentDate = DATEADD(d, 1, @CurrentDate)
END
GO

-- If you want to verify you can uncomment this line.
-- SELECT * FROM dbo.DimDate WHERE DateKey > 20110000

PRINT 'Done adding new dates to dbo.DimDate'
GO





/*-----------------------------------------------------------------------------------------------*/
/* Step 4. Update the Fact Tables with the new dates. */
/*-----------------------------------------------------------------------------------------------*/


PRINT 'Update Fact Tables'
GO

SET NOCOUNT ON

-- To move forward five years, we simply add 50,000 to the date key

-- 4.1 FactFinance ------------------------------------------------------------------------------
PRINT ' FactFinance'
GO

UPDATE [dbo].[FactFinance]
SET [DateKey] = [DateKey] + 50000;


-- 4.2 FactInternetSales ------------------------------------------------------------------------
PRINT ' FactInternetSales'
GO

-- There are a few rows where the due date is on leap year. Update these to back off a day
-- so the date add works OK
UPDATE [dbo].[FactInternetSales]
SET [OrderDateKey] = 20080228
, [OrderDate] = '2008-02-28'
WHERE [OrderDateKey] = 20080229

UPDATE [dbo].[FactInternetSales]
SET [DueDateKey] = 20080228
, [DueDate] = '2008-02-28'
WHERE [DueDateKey] = 20080229

UPDATE [dbo].[FactInternetSales]
SET [ShipDateKey] = 20080228
, [ShipDate] = '2008-02-28'
WHERE [ShipDateKey] = 20080229

-- Now update the rest of the days.
UPDATE [dbo].[FactInternetSales]
SET [OrderDateKey] = [OrderDateKey] + 50000
, [DueDateKey] = [DueDateKey] + 50000
, [ShipDateKey] = [ShipDateKey] + 50000
, [OrderDate] = DATEADD(yy, 5, [OrderDate])
, [DueDate] = DATEADD(yy, 5, [DueDate])
, [ShipDate] = DATEADD(yy, 5, [ShipDate])


-- 4.3 FactResellerSales ------------------------------------------------------------------------
PRINT ' FactResellerSales'
GO

-- As with Internet Sales, there are rows where the due date is on leap year.
-- Update these to back off a day so the date add works OK
UPDATE [dbo].[FactResellerSales]
SET [OrderDateKey] = 20080228
, [OrderDate] = '2008-02-28'
WHERE [OrderDateKey] = 20080229

UPDATE [dbo].[FactResellerSales]
SET [DueDateKey] = 20080228
, [DueDate] = '2008-02-28'
WHERE [DueDateKey] = 20080229

UPDATE [dbo].[FactResellerSales]
SET [ShipDateKey] = 20080228
, [ShipDate] = '2008-02-28'
WHERE [ShipDateKey] = 20080229

-- Now update the table
UPDATE [dbo].[FactResellerSales]
SET [OrderDateKey] = [OrderDateKey] + 50000
, [DueDateKey] = [DueDateKey] + 50000
, [ShipDateKey] = [ShipDateKey] + 50000
, [OrderDate] = DATEADD(yy, 5, [OrderDate])
, [DueDate] = DATEADD(yy, 5, [DueDate])
, [ShipDate] = DATEADD(yy, 5, [ShipDate])

-- 4.4 FactSalesQuota ---------------------------------------------------------------------------
PRINT ' FactSalesQuota'
GO

UPDATE [dbo].[FactSalesQuota]
SET [DateKey] = [DateKey] + 50000

-- 4.5 FactSurveyResponse -----------------------------------------------------------------------
PRINT ' FactSurveyResponse'
GO

UPDATE [dbo].[FactSurveyResponse]
SET [DateKey] = [DateKey] + 50000

-- 4.6 FactCallCenter ---------------------------------------------------------------------------
PRINT ' FactCallCenter'
GO

-- All the rows in call center have a 2010 date, just add 3 years to make these 2013
UPDATE [dbo].[FactCallCenter]
SET [DateKey] = [DateKey] + 30000


-- 4.7 FactCurrencyRate -------------------------------------------------------------------------
PRINT ' FactCurrencyRate'
GO

-- Because the DateKey is part of the PK, we have to drop the key before we can update it
ALTER TABLE [dbo].[FactCurrencyRate] DROP CONSTRAINT [PK_FactCurrencyRate_CurrencyKey_DateKey]
GO

-- Shift the 2008 Leap Year days to 2012 Leap Year
UPDATE [dbo].[FactCurrencyRate]
SET [DateKey] = 20120229
WHERE [DateKey] = 20080229

-- Update everything except the leap year we fixed already
UPDATE [dbo].[FactCurrencyRate]
SET [DateKey] = [DateKey] + 50000
WHERE [DateKey] <> 20120229

-- Add the PK back
ALTER TABLE [dbo].[FactCurrencyRate]
ADD CONSTRAINT [PK_FactCurrencyRate_CurrencyKey_DateKey] PRIMARY KEY CLUSTERED
( [CurrencyKey] ASC,
[DateKey] ASC
)
WITH ( PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, IGNORE_DUP_KEY = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO


-- 4.8 FactProductInventory ---------------------------------------------------------------------
PRINT ' FactProductInventory'
GO

-- As with the previous step, the date is part of the primary key, so we need to drop it first.
ALTER TABLE [dbo].[FactProductInventory] DROP CONSTRAINT [PK_FactProductInventory]
GO

-- Shift the 2008 Leap Year days to 2012 Leap Year
UPDATE [dbo].[FactProductInventory]
SET [DateKey] = 20120229
WHERE [DateKey] = 20080229

-- Update everything except the leap year we fixed already
UPDATE [dbo].[FactProductInventory]
SET [DateKey] = [DateKey] + 50000
WHERE [DateKey] <> 20120229

-- Add the PK back
ALTER TABLE [dbo].[FactProductInventory]
ADD CONSTRAINT [PK_FactProductInventory] PRIMARY KEY CLUSTERED
( [ProductKey] ASC
, [DateKey] ASC
)
WITH ( PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, IGNORE_DUP_KEY = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO

PRINT 'Done updating the Fact tables'
GO



/*-----------------------------------------------------------------------------------------------*/
/* Step 5. Cleanup, remove the helper function we added earlier. */
/*-----------------------------------------------------------------------------------------------*/
PRINT 'Removing Helper Function'
GO

IF EXISTS (SELECT 1 FROM [sys].[all_objects] WHERE [name] = 'DateToDateId')
DROP FUNCTION [dbo].[DateToDateId];
GO

/*-----------------------------------------------------------------------------------------------*/
/* All done! */
/*-----------------------------------------------------------------------------------------------*/
PRINT 'Updating AdventureWorks2012 for Today - Completed'
GO

SSIS tip for Lookup Transformations

The Lookup Transformation is a cornerstone of almost any SSIS package. The vast majority of packages that load Fact tables use Lookups extensively. Many of these lookups reference the same tables over and over.

As we all (hopefully) know by now, opting to lookup against a table is akin to doing a SELECT *. Best practices guide you to only select the columns you really need for a lookup, typically the surrogate key and business key. Because I often reference the same tables over and over, I’ve taken to keeping all my frequently referenced tables in a single SQL file.

I typically store all my projects in a “Projects” folder off my root directory (aka C:\). Under it I create a folder for each project I work on. Within there I create a folder simply called SQL, where I store some SQL scripts. Some are just temporary as I work through issues, or as in this case a good place to store my commonly used lookups. It will wind up looking something like:

-- Employee SELECT DimEmployeeId, EmployeeBusinessKey FROM DimEmployee -- Company SELECT DimCompanyId, CompanyBusinessKey FROM DimCompany -- Office SELECT DimOfficeId, OfficeBusinessKey FROM DimOffice -- More here

That’s a very generic example, but you get the idea. Simple, but very handy.

Creating a Data Warehouse Date Id in Task Factory Advanced Derived Column Transformation

The company I work for, Pragmatic Works, makes a great tool called Task Factory. It’s a set of transformations that plug into SQL Server Integration Services and provides a wealth of new controls you can use in your packages. One of these is the Advanced Derived Column Transformation. If you are familiar with the regular Derived Column transformation built into SSIS, you know that it can be painful to use if you have to create anything other than a very basic calculation. Every try typing something complex into that single row tiny little box? Egad.

The Task Factory Advanced Derived Column transform allows you to pop up a dialog and have true multi-line editing. In addition there are 180 addition functions to make your life easier. Which is actually the point of this whole post.

As a Business Intelligence developer, one of the things I have to do almost daily is convert a date data type to an integer. Most dates (at least in the US) are in Month / Day / Year format. Overseas the format is usually Day / Month / Year (which to me makes more sense). SQL Server Analysis Services loves integer based field, so a common practice is to store dates as an integer in YYYYMMDD format.

Converting a date to an integer using the derived column transform can be ugly. Here’s an example of a fairly common (although not the only) way to do it:


(DT_I4)((DT_WSTR,4)YEAR(MyDateColumn) + RIGHT("00" + (DT_WSTR,2)MONTH(MyDateColumn),2) + RIGHT("00" + (DT_WSTR,2)DAY(MyDateColumn),2))

Task Factory makes this much easier. There is a ToChar function which converts columns or values to characters. This function allows you to pass in a format to convert to. Wrap all that in a ToInteger function and away you go. Check this out:


ToInteger(ToChar(MyDateColumn, "yyyyMMdd"))

Much, much simpler. One thing, the case of the format is very important. It must be yyyyMMdd, otherwise it won’t work. If you want to extend this more, you can actually check for a null, and if it is null return a –1 (a common Id for a missing row) or another special integer to indicate a missing value, such as 19000101.


IIf(IsNull(MyDateColumn)
   , -1
   , ToInteger(ToChar(MyDateColumn, "yyyyMMdd"))
   )

Here we first check to see if the column is null, if so we return the missing value, else we return the date converted integer. And yes, you can do multi line code inside the Advanced Derived Column Transformation.

As you can see the Advanced Derived Column Transformation makes working with dates much, much easier than the standard derived column transformation. This is such a common need that, at the risk of sounding like an ad, I decided to blog about it so I can share this with all my clients in the future.

(Just to be clear, it’s not an ad, I was not asked to do this, nor did I receive any money for it. Mostly I did this post just so I could share the syntax when I start each project or training class.)

I’m Speaking! SQL Saturday Nashville and PowerShell Saturday Atlanta

Just wanted to let folks know I’ll be doing presentations at two upcoming events.

The first is SQL Saturday #145 in Nashville. That’s this weekend, October 13th. I’ll be giving my “Introduction to Data Warehousing / Business Intelligence” presentation. Here is the slide deck I’ll be using: introtodatawarehousing.pdf

My second presentation will be October 27 in Atlanta at PowerShell Saturday #003. Yep, the PowerShell guys are taking the Saturday concept and kicking off a series of PowerShell Saturdays. This is only the third, but I see many more coming in the future.

At PowerShell Saturday I’ll be presenting “Make SQL Server Pop with PowerShell”. I’ll cover both the SMO and SQL Provider during this session.

Looks like it’ll be a busy October, but I’d hurry as both events are filling up so don’t wait and get registered now!

Data Warehousing / Business Intelligence at Pluralsight

It’s been another busy month, and I’m pleased to announce my two newest modules were published today on Pluralsight. If you’re not familiar with Pluralsight, they are a training company that has a vast library of training videos.

The description of my new course can be found here:

http://www.pluralsight-training.net/microsoft/olt/Course/Toc.aspx?n=intro-dwbi-course

There are two modules, the first is an introductory session for DW/BI. It’s primarily slides, by the end you’ll have a grasp on the arcane terms around Business Intelligence such as facts, dimensions, surrogate keys, OLAP, and more.

The second module is an overview of the Microsoft tools for doing DW/BI. It starts with the Adventure Works Lite database. From there a data warehouse is built, on top of which an analysis services cube is created. Finally a report will be generated off the cube that meets a specific business need.

Along the way attendees will get to see Visual Studio 2010 Database Projects, along with the SQL Server toolkit: SQL Server Integration Services, SQL Server Analysis Services, and SQL Server Reporting Services. Attendees will also get a look at SQL Server Management Studio.

Enjoy!

CodeStock 2010

It’s June, must be time for CodeStock! For those who don’t know, CodeStock is a conference but on annually by the East Tennessee .Net Users Group. This year I am fortunate to have been selected for two presentations.

The first is The Decoder Ring for Data Warehousing / Business Intelligence. This is a concepts talk in which you’ll learn about the terms and overall design of a Data Warehouse, and what they mean when they say Business Intelligence. While we’ll mention the products SQL Server offers, unfortunately we won’t have time for much in the way of a demo.

My second session of the day is a nice follow on to the above session, but will also work even should you not have been in the first session. In SSIS For Developers, we’ll look at how SSIS, commonly used in Data Warehousing, can also be used by most developers to solve issues that frequently come up in the course of their job. Data conversion and exporting data are two good examples, and we’ll also look at how to call your new SSIS job from your .Net application.

There are two code demos used during the presentation, both available at my Code Gallery site. The first is the basic SSIS For Devs demo with the three packages. The second is the more complex example showing how to call SSIS from your .Net application.

What I learned at TechEd

Last week I was at the Microsoft TechEd conference in North America, along with over 10,000 of my closest friends. I spent a lot of time in the Microsoft floor area talking to people, and came away with some interesting info about new technologies. As I’m sharing some of these at the Steel City SQL user group tonight, I thought I’d share here too.

First up is OData, the Open Data Protocol from Microsoft. It is an ATOM feed but for data. People can publish under the OData format and be able to consume the data from either a JSON or AtomPub. You can also add security, should you wish to have data available to many consumers but only on a permission basis. You can learn more at http://www.odata.org

Next up is Microsoft’s new “Dallas” project. Dallas is the code name for a data marketplace on it’s Azure platform. Through Dallas users and vendors will be able to consume / provide data feeds. Some will be free, others will be at some cost. There is a catalog through which consumers can look at the various feeds available. This is very much in it’s infancy but there are a few feeds which you can look at and preview.

Microsoft’s SQL Server 2008 R2 Parallel Data Warehouse looked interesting, although it fits a very niche market. It’s an appliance you can purchase that is essentially a rack of SQL Servers. One is the master server, and coordinates all the child servers. As a DBA you manage what appears to be a “normal” instance of a SQL Server. Behind the scenes the controller will propagate changes to the other servers in it’s hub. Scaling can be achieved by simply adding more servers to the existing rack, or additional racks as needed. PDW becomes economical starting around 10 terabytes and scales to well over 100 terabytes of data.

The folks at Red Gate have a new tool called SQL Search that they have released for free to the community. SQL Search is an add-on for SQL Server Management Studio that does lightening fast searches of object names in your database. Just pick the database name and term to search for and SQL Search will populate a grid with all possible matches. If you double click on the row it will navigate SSMS’s Object Explorer pane to the correct spot in the navigation tree with your object. Further, if the object is a view, stored proc, etc it will even display the SQL of the object and highlight the searched item. And did I mention it’s free?

Speaking of cool, free tools the folks at Confio have created a free version of their popular Ignite tool called IgniteFree. It is a real time performance monitoring tool that will work with not just SQL Server but Oracle and DB2 as well. They have versions of the tool that run on both Windows and Unix/Linux.

PowerPivot continues to fascinate and excite me, while I was at TechEd I won a copy of “PowerPivot for Excel and SharePoint”. I had this on my “to buy” list anyway so considered myself lucky. I’m about a sixth of the way through the book and it has been really good so far. It starts with a quick tour of the Excel piece, then walks you through the SharePoint install so you can quickly get up and running in a test environment. Later chapters delve much more deeply into PowerPivot. If you are looking for a good PowerPivot book I would recommend it.

Finally, even if you couldn’t be there you can watch the sessions from this and past Tech Ed’s. Microsoft has released them to the general public at http://www.msteched.com/

 

*FTC Discloser, I am in the “Friends of Red Gate” program where I get copies of their tools in order to test and provide feedback. In this case the disclaimer probably isn’t necessary since the SQL Search tool is freely available to all, but I’d prefer to keep things above board.

SharePoint Saturday Huntsville

It’s another Saturday, must be time for another speaking engagement. Today I’m in Huntsville Alabama speaking at their SharePoint Saturday. Now you may wonder what a SQL BI guy is doing at a SharePoint event. There are two kinds of BI professionals in the world, those who are currently working with SharePoint and those who will be working with SharePoint. I believe the converse could also be said for SharePoint people. There those who are currently hosting BI solutions, and those who will be hosting BI solutions. It’s important then, that SharePoint professionals have a clear understanding of what business intelligence is, including all the terms that get used in the BI world. In addition SharePoint 2010 contains features to support a new tool called PowerPivot. Understanding what a PowerPivot workbook is, and how their customers will be using it will give SharePoint professionals better understanding of how to host PowerPivot content.

My first session of the day is entitled “Off and Running With PowerPivot for Excel 2010″. In this session I’ll be giving an overview of PowerPivot, then we’ll be looking at how to import and manipulate data within Excel. This presentation is demo heavy but there are a few slides which can be downloaded from this link.

In my second presentation we’ll dig into the guts of data warehousing and business intelligence. Business Intelligence has its own vocabulary, during the session the audience will learn all the buzzwords around BI and dig into the data warehouse to see how these terms are implemented. This particular presentation is a slide heavy deck, unlike my previous session there is little in the way of demonstrations. We’ll briefly touch on SharePoint, understand though that this presentation centered largely around understanding what BI is so you’ll understand what your SharePoint system is being used for. The slides are available from this link.

If you are reading this after the fact, then let me thank you for coming to my presentations. I hope you found them useful, please don’t hesitate to contact me if I can be of further assistance.

Live Streaming from SQL Saturday 41

One of the sponsors for tomorrow’s SQL Saturday in Atlanta Georgia, a company named Set Focus, is going to be live streaming three presentations from the event. I just got the word that my session, "Introduction to Data Warehousing/Business Intelligence" was selected as one of the sessions. My session kicks off the event at 8:30 a.m. Eastern time. Information and a link to the stream site can be found on Set Focus’s blog:

http://blogs.setfocus.com/radar/2010/04/22/streaming-sqlsaturday/

SQL Saturday 41 was sold out some time ago, and there is even quite a waiting list, so if you’re unable to attend then at least you can sit in on three of the sessions via the live stream. The other two sessions to be streamed are "SQL Server Memory Deep Dive" by Kevin Boles and "Database Design Patterns" by Louis Davidson. Both are fellow Microsoft MVPs and excellent presenters, I know you’ll enjoy their presentations as well.

Live streaming technology really excites me. While I feel that you can get the best experience and education from being live at the event, I also understand that this is not always possible for everyone. Work conflicts, distance, family obligations, or the event simply being sold out, as this one is, can limit a person’s ability to attend in person. Live streaming events such as SQL Saturday really helps us to extend our reach into the community and to help serve those who for whatever reason cannot be with us at the event. I want to give a great big thanks to the folks over at Set Focus for making this happen.

Presenting “Intro to DW/BI” at the Regions Bank Users Group

Today I have the privilege of presenting to the internal Regions Bank users group an Introduction to Data Warehousing and Business Intelligence.

The power point slide deck is available for download that this link.

To all who made it, thanks for attending and please let me know if you have any questions.

SQL Saturday 29 Birmingham

We just finished up our SQL Saturday here in Birmingham Alabama. It was number 29 in the list of SQL Saturdays. First off let me cover some supporting material for the two sessions I gave. The first was an introduction to Microsoft’s new self service BI tool, PowerPivot. Attendees can download my slides here: PowerPivot Slides  You can also see all my posts so far on PowerPivot at http://arcanecode.com/category/powerpivot/ or by picking PowerPivot from the drop down over on the right side of this blog.

My second session of the day was an introduction to SSIS. Step by step instructions, the sample project, and the slide deck can all be found on my Code Gallery site. At the end we got a bit rushed for time, hopefully some of your questions can be answered from some of my past posts on SSIS. If not feel free to send me an e-mail (rcain at comframe.com or arcanecode at gmail.com) and I’ll see what I can do to help.

Attendees of both sessions may also find my Introduction to Data Warehousing/Business Intelligence slide deck helpful to clarify some BI terminology.

In my roles as speaker, volunteer, and event planner I had little time to take pictures, but I did grab a few at the very end of the day, I thought I’d share them here:

IMAGE_152 

Some lucky winners of books looking over the remaining stack to pick out their prize.

John Baldwin, our fearless leader is in the grey shirt all the way on the right.

IMAGE_153

Obligatory crowd shot. Dividers broke this big room down to 4 rooms where we had our sessions.

IMAGE_154

One more crowd shot, showing some of the higher end swag including a Wii, a Garmin GPS, multiple

copies of Office and Windows 7, and two copies of the SQL Server MVP Deep Dives book I coauthored.

My role in event planning was acting as the speaker coordinator. Finding quality people willing to travel to Birmingham, on their own time and expense,  to give presentations. Thanks to Sven Aelterman, Kevin Boles, Louis Davidson, Janis Griffin, Kevin Grohoske, Geoff Hiten, Rodney Landrum, Vincent Mayfield, Aaron Nelson, Barry Ralston, Joe Webb and Jim Wooley. It was their presentations that helped us draw the big crowd we did.

Speaking of the crowd, much thanks to all of you who attended. The folks in my sessions were very attentive, asked many good questions, and kept the discussion lively and interesting. I’m glad all of you came and hope to see you all next year.

I also would be remiss if I didn’t thank all of our sponsors. Without their donations we would not have been able to put on the event. Microsoft, Teksouth, Bit Wizards, Confio, Redgate, Attunity, Telerik, Intellinet, CozyRoc, Wrox, TekSystems and O’Reilly Publishing and to the SQL PASS organization.

I should also give personal note of thanks to my employer COMFRAME, for putting up with my extended lunches and letting me juggle my schedule to run errands and do other planning activities and presentation prep time.

Finally a few last thanks and kudos are definitely in order. The first should go to Vito Amato and his merry band of volunteers. They kept everyone in cold drinks, helped the speakers with their needs, answered attendees questions, checked folks in at the door, and in general did everything that needed to be done to keep the event running smoothly.

A big thanks and congratulations to John Baldwin, our fearless leader, and his right hand man Morgan Smith for taking the leadership to plan and organize the event. They worked long and hard to make the event the success it was.

If you want to keep the fun and education continuing, we’d love to have you join us at our monthly user group meetings, http://www.steelcitysql.org/. Thanks for a great SQL Saturday, and I can’t wait for next year’s!

Pragmatic Works Free SQL Server Training

Just wanted to mention the Pragmatic Works company did a week of free webinars. You can view all of these which cover a wide variety of subjects. The sessions include:

  • Introduction to Managing a SQL Server Database by Jorge Segarra
  • Beginning T-SQL by Patrick LeBlanc
  • The Modern Resume: Building Your Brand by Brian Knight
  • How to Become An Exceptional DBA by Brad McGehee
  • Fundamentals of SSIS by Brian Knight
  • 0 to Cube in 60 Minutes (SSAS) by Brian Knight
  • Trouble Shooting SQL Server by Christian Bolton
  • Introduction to SQL Server Reporting Services by Devin Knight

To get access to the seminars, simply go to:

http://www.pragmaticworks.com/resources/webinars/February2010Webinar.aspx

Alabama Code Camp Mobile 2010

Last Saturday was the Alabama Code Camp, held in Mobile AL. For those unfamiliar with the Alabama Code Camps, we hold on average two a year, and they shift from city to city with different user groups acting as the host group. Other cities include Huntsville, Birmingham, and Montgomery. This time though the Lower Alabama Dot Net User Group under the leadership of Ryan Duclos hosted, and what a great event it was. Everything ran smoothly, there was plenty of drinks and pizza to go around, and some good swag to boot. A big congrats to Ryan and his team of volunteers for a great event, also thanks to Microsoft for sponsoring and the University of South Alabama for the venue.

I was kept busy at this code camp, doing three sessions. The first session was “Introduction to Microsoft PowerPivot”. The slide deck can be found at http://arcanecode.files.wordpress.com/2010/01/powerpivot_long.pdf. To see all my PowerPivot posts, simply pick it in the categories to the right or use this link: http://arcanecode.com/category/powerpivot/.

My second session was on Full Text Searching. You can find code samples and the PDF for the presentation at my code gallery site, http://code.msdn.microsoft.com/SqlServerFTS.

The final presentation was an introduction to Business Intelligence and Data Warehousing. Here is the link to the presentations slides in PDF format. As promised in the session I added the additional information for the Kimball Group book.

A quick apology for my delay in posting, a nasty head cold has had me in Zombie land since I got back. Thanks to all who attended, I appreciate you being very interactive, lots of questions, and very attentive. I look forward to the next time Mobile hosts the Alabama Code Camp.

PowerPivot for Excel 2010 at the Steel City SQL Users Group January 19, 2009

Today I am at the Steel City SQL Users Group, presenting on Microsoft’s newest BI Tool, PowerPivot.

The slide deck for this meeting can be found at:

http://arcanecode.files.wordpress.com/2010/01/powerpivot_long.pdf

To see all my posts so far on PowerPivot, you can use the link below to filter.

http://arcanecode.com/category/powerpivot/

Thanks for coming!

Refreshing PowerPivot Data

If you remember my Intro to PowerPivot post, you’ll recall that every organization has that power user who leveraged the tools he had on hand, mostly Excel, to do their own form of Business Intelligence. One headache that plagued “that guy” was the freshness of data. It could be quite difficult to keep the data used in his or her spreadsheets up to date. One of the great boons that PowerPivot offers is the ability to quickly and easily refresh data.

Once again let’s return to the Excel 2010 spreadsheet we created in my Combining Data from Multiple Sources in PowerPivot. In this example we’ll put ourselves in the role of a salesperson, creating an analysis for one of our best customers, one Lucy Harrington. Going to the Customer tab in our PowerPivot workbook, we click on the drop down menu for the FirstName column. Clear the “Select All” option then scroll down and only check the first name “Lucy” which should be sufficient to find only the customer we wish.

clip_image001

In looking at her record, we are horrified to see that some nit-wit in data entry had improperly given our best customer the title of “Mr.” even though we know Lucy is not only a “Mrs.” but as a newlywed very insistent on the use of the Mrs. title.

clip_image002

Correcting this is easy enough, in the source system we can go make the fix, or perhaps we bribe the DBA with lunch so he’ll make our data fix a high priority and issue this update command to the database:

UPDATE SalesLT.Customer 
   SET Title = 'Mrs.'
 WHERE FirstName = 'Lucy'

So now how do we refresh our data in PowerPivot? Easy enough. In PowerPivot on the Home tab you’ll notice a Refresh button in the Connections group. Clicking on it gives us two options, one to refresh just the current table we are viewing, another to refresh all of the tables in our PowerPivot work area. For this example we’ll pick just the Refresh command.

clip_image003

Once you pick Refresh the Data Refresh dialog appears to let you know of its progress. When done it will let you know of its success and you can then click the Close button to close the window. At this point you’ll see some behavior that I can only attribute to the fact that as of this writing PowerPivot is still in the CTP stage. My screen flashed a few times, then when it was done I was on the last tab in my PowerPivot workbook, the CountryInfo tab. Simply click on the Customer tab to return.

A great thing about the Refresh is that it left our filters intact, you’ll see we are still only looking at Lucy’s, and even better we can see the title of our best customer is now corrected:

clip_image004

If you’re using the same sheet from the previous lessons, you will also notice the calculated column we’d put in place, FullName, is still present.

clip_image005

A word of caution.

When using the Refresh feature, be aware that every time you do a refresh you are pulling the entire dataset back, not just the changed rows. Using SQL Server Profile I monitored the Refresh operation from the server side, and found the query issues was:

SELECT [SalesLT].[Customer].*
  FROM [SalesLT].[Customer]

If you were to attempt this on a large table in your production system during a busy production time it could have severely negative consequences. Imagine the slowdown that would occur if this were executed against a system having hundreds of millions of rows.

When getting into data sets that large, I would advise moving towards a more structured data warehousing environment. This would provide a structure better suited for analysis. If nothing else though, get IT to setup a copy of the production system in a location updated during off peak hours so that analysts could pull massive amounts of data without impacting production systems.

Summary

Taking into account the strain on the source systems, the easy one button refresh does provide a great way for analysts to ensure the data they are making business decisions from is current and up to date.

Follow

Get every new post delivered to your Inbox.

Join 93 other followers