Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search

While SQL Server 2005 had Noise Words for its full text search, SQL Server 2008 has moved to StopWords. The good news is the change is much deeper than just a rebranding. I say good news, because with the change comes a lot more flexibility and functionality.

There are actually two new tools introduced with SQL Server 2008, StopWords and StopList. A StopList acts as a named container for a group of StopWords. You can then associate a StopList with one or more tables. This is a great enhancement over Noise Words, which applied to the entire server. Now you can associate a group of StopWords, in a StopList, with specific tables without affecting the rest of the tables on the database or server.

First, let’s run a query to demonstrate. I’ll be doing all my testing against the AdventureWorks2008 database. If you don’t have it, you can download it at the CodePlex site. I’ve also used the standard Create Catalog command to create a full text catalog named AdvWorksCatalog. I then put a full text index on the Production.ProductDescription field. That done, here is a query I’ve used for examples in the past.

–Demo 1 – Query the Index

use AdventureWorks2008;

 

select [Name], ProductNumber, [Description]

from [Production].[Product] p

   , [Production].[ProductDescription] pd

   , [Production].[ProductModelProductDescriptionCulture] pmpdc

where p.ProductModelID = pmpdc.ProductModelID

  and pmpdc.ProductDescriptionID = pd.ProductDescriptionID

  and CONTAINS(pd.[Description], ‘shifting’)

 

Now we need to create the stoplist. Here’s we’ll use the first of the new SQL Server 2008 commands, create fulltext stoplist.

– Demo 2 – Create Stoplist

use AdventureWorks2008;

 

create fulltext stoplist ArcanesStopList;

The StopList will act as a holder for a specific set of words that we want to ignore. We refer to that group of words by the name we gave it, ArcanesStopList. Now we need to add some words to the list. Here are two ways to do so, both use the alter fulltext stoplist command.

– Demo 3

– Add StopWord

 

use AdventureWorks2008;

 

alter fulltext stoplist ArcanesStopList

  add ‘shifting’ language 1033;

 

alter fulltext stoplist ArcanesStopList

  add ‘light’ language ‘English’;

 

The command is straight forward, use the alter fulltext stoplist, give it the name of the list you want to add a word to. Then comes the word add, followed by the word you want to add. Next you have to specify the language. There are two ways to specify the language, either by using the language ID (in my case 1033) or the name for the language. If you don’t know your language, just use this query to find it:

– Here’s how to find your language  

select [name], alias, lcid from master.sys.syslanguages;  

If you were to jump the gun and rerun the query from Demo 1, you’d think it would now ignore the word shifting since we just added it as a StopWord to our StopList. But there’s still one more step.

You need to attach your StopList to a table that has a full text index on it. This is a major improvement over 2005. SQL Server 2008 now allows you to get quite granular with the application of custom groups of words. You are limited to one StopList per table, however. One StopList can be applied to multiple tables. Here’s the code to associate our StopList with a table:

– Demo 4 – Add the StopList to the Full Text Index

use AdventureWorks2008;

 

alter fulltext index on [Production].[ProductDescription]

  set stoplist ArcanesStopList;

The alter fulltext command has been modified to add a set stoplist set of keywords. All we need to do is specify the tablename and the StopList to associate with that table. Now go run the query from Demo 1. You should get back zero rows.

Congratulations! You’ve now associated your StopList with the full text index..

I’m sure however you don’t want to leave it this way, so let’s look at what it will take to clean up the mess. First, you can decide you no longer want the StopList associated with the full text index. Time to use the alter command again.

– Demo 4-2 – Remove the StopList to the Full Text Index

use AdventureWorks2008;

 

alter fulltext index on [Production].[ProductDescription]

  set stoplist system;

Setting the stoplist to the keyword system will change from your custom stoplist to the system stoplist. You can also use the word off instead of system to turn off StopWords all together for the specified table. Now would be a good time to mention, if you want to use the standard system set of StopWords, instead using a custom set, use the system keyword, as you see above.

There may be times when you want to remove just a word or two from a StopList, but not disassociate the entire list. Its possible to easily remove individual words from the list.

– Demo 4-3 – Remove single word from Stoplist

use AdventureWorks2008;

 

alter fulltext stoplist ArcanesStopList

  drop ‘shifting’ language 1033;

 

alter fulltext stoplist ArcanesStopList

  drop ‘light’ language ‘English’

The syntax for the drop is identical to the add, except of course for using the word drop instead of add.

Finally, you may want to drop the StopList all together. There’s a drop command for it as well.

– Demp 5-3 – Drop StopList

use AdventureWorks2008;

 

drop fulltext stoplist ArcanesStopList;

This covers the basic use of StopWords / StopLists in 2008. I believe the ability to associate a specific set of words with one or more specific tables will give your searches greater power and flexibility.

About these ads

3 Responses to “Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search”

  1. Neil Burnett Says:

    Any idea on how to ignore stopwords in search phrases? the transform noise words setting is ignored in 2008.

  2. SQL Server Full Text Searching at the Atlanta Code Camp « Arcane Code Says:

    [...] and 2008 Full Text Search Creating and Customizing Noise Words in SQL Server 2005 Full Text Search Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / [...]

  3. Mark P Says:

    Excellent Review, thanks for laying down the basics which were easy to follow…even when there is noise in the background.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 104 other followers

%d bloggers like this: