Category Archives: T-SQL

Presenting SQL Server 2005 2008 Full Text Searching at the TechEd 2008 InBetween conference

On June 7th, 2008 I am presenting “Getting Started with SQL Server 2005/2008” at the InBetween SQL Saturday conference that is taking place between the two weeks of Microsoft TechEd. This post will have all the links relevant to my talk.

Detailed descriptive material can be found on my blog, ArcaneCode.com, with links to specific posts below.

First off, the slides and sample code can be located at the Code Gallery site I setup specifically for Full Text Searching with SQL Server:

http://code.msdn.microsoft.com/SqlServerFTS

Look on the downloads page to see various projects around SQL Server Full Text Searching. I’ve created one “release” for each of the projects around FTS. Be sure to look on the right side at the various releases in order to see the various projects.

Next, you can get started with the basics by reading these entries on my blog:

Lesson 0 – Getting the Bits to do Full Text Searching in SQL Server 2005
Lesson 1 – The Catalog
Lesson 2 – The Indexes
Lesson 3 – Using SQL
Lesson 4 – Valid Data Types
Lesson 5 – Advanced Searching

After that you’ll be ready for some advanced topics.

Can you hear me now? Checking to see if FTS is installed.
Exploring SQL Servers FullTextCatalogProperty Function
Using the ObjectPropertyEx Function
Using FORMSOF in SQL Server Full Text Searching
Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search
Creating and Customizing Noise Words in SQL Server 2005 Full Text Search
Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search
Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists

Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search

SQL Server Full Text Search has the ability to search not just for a word, but for various forms of a word based upon the built in thesaurus. To use this ability in your own full text searches, you simply use the FormsOf clause. Here’s an example using the AdventureWorks database, with a full text index on the product description table’s Description field.

select [Name], ProductNumber, [Description]

from [Production].[Product] p

   , [Production].[ProductDescription] pd

   , [Production].[ProductModelProductDescriptionCulture] pmpdc

where p.ProductModelID = pmpdc.ProductModelID

  and pmpdc.ProductDescriptionID = pd.ProductDescriptionID

  and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, light)’ )

One of the biggest questions I get asked is, how do we customize the Thesaurus file used by SQL Server? Today we’ll look at how to do just that.

The first step is tracking it down. SQL Server stores the location of the thesaurus in the registry. The location depends on whether you are in SQL Server 2005 or 2008. For SQL Server 2005, open up regedit or your favorite editor and navigate to:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]

In my case, my instance was named MSSQL.1. I live in the US, so my language is “enu”, which stands for English, US. Don’t get this confused with “eng”, that is for our friends in the United Kingdom. (Unless of course that’s where you live in which case do indeed pick eng). There are around 17 supported languages out of the box.

OK, once you’ve navigated to the right spot in the registry tree, look for the key named TsaurusFile. Mine is listed as being located at:

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\tsenu.xml

In SQL Server 2008 you have to go to two places. First you need to find the path. For that, again open up regedit and navigate to:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSQLServer

For me, the instance name was MSSQL10.MSSQLSERVER. The name you should look for is “FullTextDefaultPath”. The structure was very similar to the 2005 version:

C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\FTData\

Now to find the file name. Jump up the registry tree a little to:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]

Just like with 2005, use ENU if you are in the US, or your appropriate language abbreviation. Looking at the name “TsaurusFile”, it probably comes as no surprise to discover the name is “tsenu.xml”.

Now that we’ve located it, let’s navigate to the folder and open it up in your favorite text editor (which if you’ve been reading my past blog posts is ANYTHING other than Notepad). Let’s take a look:

<XML ID=”Microsoft Search Thesaurus”>

<!–  Commented out

    <thesaurus xmlns=”x-schema:tsSchema.xml”>

    <diacritics_sensitive>0</diacritics_sensitive>

        <expansion>

            <sub>Internet Explorer</sub>

            <sub>IE</sub>

            <sub>IE5</sub>

        </expansion>

        <replacement>

            <pat>NT5</pat>

            <pat>W2K</pat>

            <sub>Windows 2000</sub>

        </replacement>

        <expansion>

            <sub>run</sub>

            <sub>jog</sub>

        </expansion>

    </thesaurus>

–>

</XML>

Note there is one minor difference between the 2005 version, shown above, and the 2008 version. In 2008, the <diacritics_sensitive> line is changed to:

       <diacritics = false/>

In either case, you won’t need to adjust this line. One other note, if you are editing this in Vista, be sure to launch your text editor in Run As Administrator mode so you can save the changes.

The first thing to notice is the file is commented out. We’ll want to uncomment as the first step. Next you’ll see the <thesaurus…> section. We won’t need to change it.

The next three sections are examples. Ultimately you’ll delete and replace with your own, but let’s take a moment to look at what is there. The first section is an <expansion>…</expansion> tag. With an expansion, all terms are equivalent. With the expansion tags, if the user enters any one of those terms, it’s the same as if they’d entered all of the terms. Thus in the first example, if a user were to type in “Internet Explorer”, a full text search would return all records that contained “Internet Explorer”, “IE”, or “IE5”.

Replacements are something you’ll use less often. With Replacements, SQL Server does not look for the word in the <pat> (pattern) tag, instead it looks for the word in the <sub> tag. In this case if a user types in W2K, the full text search engine will instead look for “Windows 2000”. The only true life situation I can think of where this would be useful is addresses. If you know your system converts all street or state abbreviations to their full expanded name, then you could use this as a trap of sort with full text searching. For example:

<replacement>

  <pat>St.</pat>

  <pat>Str.</pat>

  <sub>Street</sub>

</replacement>

Thus a user typing in Elm St. would be able to find Elm Street in your system. You could also expand this with states. Let me reiterate this example assumes your source system automatically replaces all abbreviations with their full words. Using a replacement saves the full text search engine for looking for words that won’t be there. Of the two, expansions will probably be the one you use most of the time.

So let’s create a simple example to test out our custom thesaurus. Open up your tsenu.xml file, and let’s change it to:

<XML ID=”Microsoft Search Thesaurus”>

    <thesaurus xmlns=”x-schema:tsSchema.xml”>

            <diacritics_sensitive>0</diacritics_sensitive>

        <expansion>

            <sub>light</sub>

            <sub>TheDoctor</sub>

        </expansion>

    </thesaurus>

</XML>

(Again, in 2008 the diacritics line will be slightly different, just use what 2008 put out there for you.) Here I’m going to make the word light and the word “TheDoctor” substitutable for each other. (I’m listening to the Doctor Who soundtrack as I type this, in case you were wondering. I wanted a word I knew I wouldn’t find in the Adventure Works database.) Save the changes.

Unfortunately just saving the changes is not enough to make the full text search engine pick up the changes to our thesaurus file. We have to tell SQL Server the file has been updated. The procedure differs greatly, depending on whether you run SQL Server 2005 or 2008.

In SQL Server 2005, you unfortunately must restart the entire full text search service. In a production environment, you never want to do this when the engine is busy or in use. I have a recent blog post that talks about the SQL Server Full Text Catalog Property function. In it, I have this handy query for you to be able to tell what the full text service is doing.

select case FullTextCatalogProperty(‘AdvWorksCatalog’, ‘PopulateStatus’)

          when 0 then ‘Idle’

          when 1 then ‘Full population in progress’

          when 2 then ‘Paused’

          when 3 then ‘Throttled’

          when 4 then ‘Recovering’

          when 5 then ‘Shutdown’

          when 6 then ‘Incremental population in progress’

          when 7 then ‘Building index’

          when 8 then ‘Disk is full. Paused.’

          when 9 then ‘Change tracking’

          else ‘Error reading FullTextCatalogProperty PopulateStatus’

        end

As long as it returns “Idle” you are clear to reset. Even so, I would make sure to do this during a database outage, especially with a production database. You’ve been warned!

To reset, go to Start, Control Panel, Administrative Tools, Services. Scroll down to SQL Server Full Text Search (MSSQLSERVER), right click and Restart. That should get it for SQL Server 2005.

In SQL Server 2008 it’s a good deal simpler. Make sure you are in the right database for your catalog (in this case AdventureWorks) and issue the command:

exec sys.sp_fulltext_load_thesaurus_file 1033;

go

The 1033 on the end refers to the local identifier (LCID) for the language of your thesaurus file. 1033 is for English, US. To discover your LCID, use this simple query:

select [name], alias, lcid from master.sys.syslanguages

OK, we’re at the finish line. Assuming your full text engine has restarted, it should have picked up your new, customized thesaurus file. Now we can alter our original query to:

select [Name], ProductNumber, [Description]

from [Production].[Product] p

   , [Production].[ProductDescription] pd

   , [Production].[ProductModelProductDescriptionCulture] pmpdc

where p.ProductModelID = pmpdc.ProductModelID

  and pmpdc.ProductDescriptionID = pd.ProductDescriptionID

  and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, TheDoctor)’ )

And we should get back the same results as the original query.

Please note that all remarks for SQL Server 2008 are based upon the February 2008 CTP 6 release. As future release candidates and final editions are released, some of the information could change. Be sure to check books on line for the latest info.

A final thing to note, if you read yesterday’s post you will note that the FullText keyword uses the thesaurus as part of it’s operation. If you have a custom thesaurus, it’s entries will also be used by FullText.

As you can see, adding custom entries to the SQL Server Full Text Search Thesaurus is not hard, there’s just some steps you need to follow in order to make it happen. Once you know how, you can use the functionality to make searches for your users more productive.

SQL Server Migration Assistant

At work we have an Oracle based system we’ll be retiring sometime next year. We want to keep the data around for reporting, but don’t have the manpower or funds to properly flatten the database into a true data warehouse schema. We are leaning then to copy the data into our SQL warehouse database in almost a direct copy of the legacy’s schema with just a few minor tweaks.

In looking around for an efficient way to handle this, I found a tool called the SQL Server Migration Assistant for Oracle. This handy tool will copy everything, tables, views, stored procedures, triggers, just about everything you’d want.

In my case, all I really want is the table layouts. I will be making a minor tweak to the table layouts so I can combine four databases into a single one. In just a few hours, I was able to use SSMA to generate create table scripts for several hundred tables. I then brought the script into my favorite text editor, UltraEdit, and used it’s regular expression capability to add a source database field as the first field in every create table script. Saved ’em, ran it and in short order have a complete schema ready to hold my data.

Even using the SQL Server Migration Assistant in such a limited fashion, I was still able to save myself weeks of coding and trying to figure out how to manually map Oracle datatypes to SQL Server. If you are looking at any kind of data conversion take a look at this tool. You can find more info on the SSMA site at:

http://www.microsoft.com/sql/solutions/migration/oracle/default.mspx

Arcane Saturday SQL Tip

I was lucky enough to escape the grip of the dread meeting monster this week and actually got to do some T-SQL coding. I needed to know what sever my T-SQL code was running on. It made a difference as I would be taking one action if we were on a production instance, another if we were on a test box. SQL Server 2005 has some nice system information views built into the master database. All you have to do is use:

Select [name] as CurrentServerName from sys.servers where server_id = 0;

That’s all there is to it, according to books online the row with server id 0 is always the current server. You may find it useful to look through the other rows in the table, as any linked servers are recorded here as well.

Introduction to SQL Server Integration Services – Huntsville Alabama Code Camp

My second post of the day at Alabama Code Camp 6 in Huntsville is “Introduction to SQL Server Integration Services”.

The slide deck is here: Intro to SSIS Slide Deck

The “cheat sheet” or script I used to do the demo is here: Script for doing the SSIS Demo You can step through it to recreate all of the things I did in the demo today.

Finally here is the finished project. It’s actually zipped, but my current host doesn’t like zip extensions so when you download it change the extension from txt back to ZIP. Finished SSIS Project

Getting Table Space Usage in SQL Server 2005

I had the need to determine how much space was being used by each of my tables in a certain schema. I found various solutions on the web, most of which involved the built in system stored procedure sp_spaceused. However none of them produced the output quite like I wanted it. Most worked on a single table, or else produced the output in multiple returns. So I did what comes naturally and cobbled together my own solution.It’s fairly straight forward. I create a temp table to hold the output, then create a cursor to cycle through all the table names in the schema set in the variable @schemaname. I set it to the most common one, dbo, but you may wish to change it. Next I cycle through each of the tables using a cursor and for each one call the sp_spaceused and insert the output as a new row in my temp table. Finally I select the results back to the user and drop the temp table to clean up.

You may note I check to see if the temp table exists as the very first thing, then if it is I drop it. Then as the last step I drop it. Call it paranoid programming, if something crashes part way through the routine I wanted the code to be “self repairing”, i.e. I didn’t want to have to remember to run the drop table statement just to start the procedure all over.

Also, you may have read somewhere that cursors in SQL Server are “evil”. Well, as a general rule you should avoid them in favor of set based coding. However there are times, such as this when I need to call a stored procedure for each row in the cursor, that a cursor will solve the issue nicely. Additionally this isn’t code I’m going to be running 500 times a day on a production server, but instead just occasionally as a system monitor type of utility.

You may notice that most of the columns are all returned as strings (varchars). That’s because the stored procedure returns them in that way. I may go back later and modify this to convert them to numbers so I can do some math, but we’ll see if that need arises.

Here’s the code, use it and abuse it, modify it and enjoy it!

/******************************************************************************

**    File: “GetTableSpaceUseage.sql”

**    Name: Get Table Space Useage for a specific schema

**    Auth: Robert C. Cain

**    Date: 01/27/2008

**

**    Desc: Calls the sp_spaceused proc for each table in a schema and returns

**        the Table Name, Number of Rows, and space used for each table.

**

**    Called by:

**     n/a – As needed

**

**    Input Parameters:

**     In the code check the value of @schemaname, if you need it for a

**     schema other than dbo be sure to change it.

**

**    Output Parameters:

**     NA

**

*******************************************************************************

**        Change History

*******************************************************************************

**    Date:            Author:                Description:

**    ——–        ——–            —————————————

**

*******************************************************************************/

 

/*—————————————————————————-*/

/* Drop the temp table if it’s there from a previous run                     */

/*—————————————————————————-*/

if object_id(N‘tempdb..[#TableSizes]’) is not null

  drop table #TableSizes ;

go

 

/*—————————————————————————-*/

/* Create the temp table                                                     */

/*—————————————————————————-*/

create table #TableSizes

  (

    [Table Name] nvarchar(128)   /* Name of the table */

  , [Number of Rows] char(11)    /* Number of rows existing in the table. */

  , [Reserved Space] varchar(18) /* Reserved space for table. */

  , [Data Space] varchar(18)    /* Amount of space used by data in table. */

  , [Index Size] varchar(18)    /* Amount of space used by indexes in table. */

  , [Unused Space] varchar(18)   /* Amount of space reserved but not used. */

  ) ;

go

 

/*—————————————————————————-*/

/* Load the temp table                                                        */

/*—————————————————————————-*/

declare @schemaname varchar(256) ;

— Make sure to set next line to the Schema name you want!

set @schemaname = ‘dbo’ ;

 

— Create a cursor to cycle through the names of each table in the schema

declare curSchemaTable cursor

  for select sys.schemas.name + ‘.’ + sys.objects.name

      from    sys.objects

            , sys.schemas

      where   object_id > 100

              and sys.schemas.name = @schemaname

              /* For a specific table uncomment next line and supply name */

            –and sys.objects.name = ‘specific-table-name-here’    

              and type_desc = ‘USER_TABLE’

              and sys.objects.schema_id = sys.schemas.schema_id ;

 

open curSchemaTable ;

declare @name varchar(256) ;  /* This holds the name of the current table*/

 

— Now loop thru the cursor, calling the sp_spaceused for each table

fetch curSchemaTable into @name ;

while ( @@FETCH_STATUS = 0 )

  begin    

    insert into #TableSizes

            exec sp_spaceused @objname = @name ;       

    fetch curSchemaTable into @name ;   

  end

 

— Important to both close and deallocate!

close curSchemaTable ;     

deallocate curSchemaTable ;

 

/*—————————————————————————-*/

/* Feed the results back                                                     */

/*—————————————————————————-*/

select [Table Name]

      , [Number of Rows]

      , [Reserved Space]

      , [Data Space]

      , [Index Size]

      , [Unused Space]

from    [#TableSizes]

order by [Table Name] ;

 

/*—————————————————————————-*/

/* Remove the temp table                                                     */

/*—————————————————————————-*/

drop table #TableSizes ;