Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists

Yesterday I covered the basics around StopWords/StopLists in SQL Server 2008. Today we’ll look at some advanced queries you can use to get more information back about FullText Indexes, StopWords, and StopLists.

The first query simply returns a list of all of the catalogs in the system.

— Get current list of full text catalogs

select [name] as CatalogName

    , path

    , is_default

  from sys.fulltext_catalogs

 order by [name];

 

The next query returns a list of all the StopLists.

 — Get the list of StopLists

 select stoplist_id

      , name

   from sys.fulltext_stoplists;

 

This query returns a list of StopWords in the database. Note the linking to get the associated StopList name and language.

 — Get list of StopWords

 select sl.name as StopListName

      , sw.stopword as StopWord

      , lg.alias as LanguageAlias

      , lg.name  as LanguageName

      , lg.lcid  as LanguageLCID

   from sys.fulltext_stopwords sw

   join sys.fulltext_stoplists sl

    on sl.stoplist_id = sw.stoplist_id

   join master.sys.syslanguages lg

    on lg.lcid = sw.language_id;

 

This next query gets a list of all of the stopwords that ship with SQL Server 2008. This is a nice improvement, you can not do this in SQL Server 2005.

— Get a list of the System provided stopwords  

select ssw.stopword

    , slg.name

  from sys.fulltext_system_stopwords ssw

  join sys.fulltext_languages slg

    on slg.lcid = ssw.language_id;

 

My next query returns a list of all the Full Text Indexes in the database.

— List full text indexes

select c.name as CatalogName

    , t.name as TableName

    , idx.name as UniqueIndexName

    , case i.is_enabled

        when 1 then ‘Enabled’

        else ‘Not Enabled’

       end as IsEnabled

    , i.change_tracking_state_desc

    , sl.name as StopListName

  from sys.fulltext_indexes i

  join sys.fulltext_catalogs c

    on i.fulltext_catalog_id = c.fulltext_catalog_id

  join sys.tables t

    on i.object_id = t.object_id

  join sys.indexes idx

    on i.unique_index_id = idx.index_id

       and i.object_id = idx.object_id

  left join sys.fulltext_stoplists sl

    on sl.stoplist_id = i.stoplist_id

 

 

This query returns a list of all the document types SQL Server 2008 understands when they are placed in a varbinary(max) field.

— List all of the document types SQL Server 2008 will understand in varbinary(max) field

select document_type

    , path

    , [version]

    , manufacturer

  from sys.fulltext_document_types;

 

If your full text performance begins to suffer over time, you might want to check and see how many fragments exist. If you have multiple closed fragments, you should consider doing a REORGANIZE on the index (using alter fulltext index). This query will tell you how many fragments exist for your full text index.

— See how many fragments exist for each full text index.

— If multiple closed fragments exist for a table do a REORGANIZE to help performance

select t.name as TableName

    , f.data_size

    , f.row_count

    , case f.status

        when 0 then ‘Newly created and not yet used’

        when 1 then ‘Being used for insert’

        when 4 then ‘Closed ready for query’

        when 6 then ‘Being used for merge inpurt and ready for query’

        when 8 then ‘Marked for deletion. Will not be used for query and merge source’

        else ‘Unknown status code’

       end

  from sys.fulltext_index_fragments f

  join sys.tables t on f.table_id = t.object_id;

 

There you go, a handful of powerful queries to help you query and maintain the state of your full text indexes.

Advertisements

Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search

While SQL Server 2005 had Noise Words for its full text search, SQL Server 2008 has moved to StopWords. The good news is the change is much deeper than just a rebranding. I say good news, because with the change comes a lot more flexibility and functionality.

There are actually two new tools introduced with SQL Server 2008, StopWords and StopList. A StopList acts as a named container for a group of StopWords. You can then associate a StopList with one or more tables. This is a great enhancement over Noise Words, which applied to the entire server. Now you can associate a group of StopWords, in a StopList, with specific tables without affecting the rest of the tables on the database or server.

First, let’s run a query to demonstrate. I’ll be doing all my testing against the AdventureWorks2008 database. If you don’t have it, you can download it at the CodePlex site. I’ve also used the standard Create Catalog command to create a full text catalog named AdvWorksCatalog. I then put a full text index on the Production.ProductDescription field. That done, here is a query I’ve used for examples in the past.

–Demo 1 – Query the Index

use AdventureWorks2008;

 

select [Name], ProductNumber, [Description]

from [Production].[Product] p

   , [Production].[ProductDescription] pd

   , [Production].[ProductModelProductDescriptionCulture] pmpdc

where p.ProductModelID = pmpdc.ProductModelID

  and pmpdc.ProductDescriptionID = pd.ProductDescriptionID

  and CONTAINS(pd.[Description], ‘shifting’)

 

Now we need to create the stoplist. Here’s we’ll use the first of the new SQL Server 2008 commands, create fulltext stoplist.

— Demo 2 – Create Stoplist

use AdventureWorks2008;

 

create fulltext stoplist ArcanesStopList;

The StopList will act as a holder for a specific set of words that we want to ignore. We refer to that group of words by the name we gave it, ArcanesStopList. Now we need to add some words to the list. Here are two ways to do so, both use the alter fulltext stoplist command.

— Demo 3

— Add StopWord

 

use AdventureWorks2008;

 

alter fulltext stoplist ArcanesStopList

  add ‘shifting’ language 1033;

 

alter fulltext stoplist ArcanesStopList

  add ‘light’ language ‘English’;

 

The command is straight forward, use the alter fulltext stoplist, give it the name of the list you want to add a word to. Then comes the word add, followed by the word you want to add. Next you have to specify the language. There are two ways to specify the language, either by using the language ID (in my case 1033) or the name for the language. If you don’t know your language, just use this query to find it:

— Here’s how to find your language  

select [name], alias, lcid from master.sys.syslanguages;  

If you were to jump the gun and rerun the query from Demo 1, you’d think it would now ignore the word shifting since we just added it as a StopWord to our StopList. But there’s still one more step.

You need to attach your StopList to a table that has a full text index on it. This is a major improvement over 2005. SQL Server 2008 now allows you to get quite granular with the application of custom groups of words. You are limited to one StopList per table, however. One StopList can be applied to multiple tables. Here’s the code to associate our StopList with a table:

— Demo 4 – Add the StopList to the Full Text Index

use AdventureWorks2008;

 

alter fulltext index on [Production].[ProductDescription]

  set stoplist ArcanesStopList;

The alter fulltext command has been modified to add a set stoplist set of keywords. All we need to do is specify the tablename and the StopList to associate with that table. Now go run the query from Demo 1. You should get back zero rows.

Congratulations! You’ve now associated your StopList with the full text index..

I’m sure however you don’t want to leave it this way, so let’s look at what it will take to clean up the mess. First, you can decide you no longer want the StopList associated with the full text index. Time to use the alter command again.

— Demo 4-2 – Remove the StopList to the Full Text Index

use AdventureWorks2008;

 

alter fulltext index on [Production].[ProductDescription]

  set stoplist system;

Setting the stoplist to the keyword system will change from your custom stoplist to the system stoplist. You can also use the word off instead of system to turn off StopWords all together for the specified table. Now would be a good time to mention, if you want to use the standard system set of StopWords, instead using a custom set, use the system keyword, as you see above.

There may be times when you want to remove just a word or two from a StopList, but not disassociate the entire list. Its possible to easily remove individual words from the list.

— Demo 4-3 – Remove single word from Stoplist

use AdventureWorks2008;

 

alter fulltext stoplist ArcanesStopList

  drop ‘shifting’ language 1033;

 

alter fulltext stoplist ArcanesStopList

  drop ‘light’ language ‘English’

The syntax for the drop is identical to the add, except of course for using the word drop instead of add.

Finally, you may want to drop the StopList all together. There’s a drop command for it as well.

— Demp 5-3 – Drop StopList

use AdventureWorks2008;

 

drop fulltext stoplist ArcanesStopList;

This covers the basic use of StopWords / StopLists in 2008. I believe the ability to associate a specific set of words with one or more specific tables will give your searches greater power and flexibility.