Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search

SQL Server Full Text Search has the ability to search not just for a word, but for various forms of a word based upon the built in thesaurus. To use this ability in your own full text searches, you simply use the FormsOf clause. Here’s an example using the AdventureWorks database, with a full text index on the product description table’s Description field.

select [Name], ProductNumber, [Description]

from [Production].[Product] p

   , [Production].[ProductDescription] pd

   , [Production].[ProductModelProductDescriptionCulture] pmpdc

where p.ProductModelID = pmpdc.ProductModelID

  and pmpdc.ProductDescriptionID = pd.ProductDescriptionID

  and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, light)’ )

One of the biggest questions I get asked is, how do we customize the Thesaurus file used by SQL Server? Today we’ll look at how to do just that.

The first step is tracking it down. SQL Server stores the location of the thesaurus in the registry. The location depends on whether you are in SQL Server 2005 or 2008. For SQL Server 2005, open up regedit or your favorite editor and navigate to:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]

In my case, my instance was named MSSQL.1. I live in the US, so my language is “enu”, which stands for English, US. Don’t get this confused with “eng”, that is for our friends in the United Kingdom. (Unless of course that’s where you live in which case do indeed pick eng). There are around 17 supported languages out of the box.

OK, once you’ve navigated to the right spot in the registry tree, look for the key named TsaurusFile. Mine is listed as being located at:

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\tsenu.xml

In SQL Server 2008 you have to go to two places. First you need to find the path. For that, again open up regedit and navigate to:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSQLServer

For me, the instance name was MSSQL10.MSSQLSERVER. The name you should look for is “FullTextDefaultPath”. The structure was very similar to the 2005 version:

C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\FTData\

Now to find the file name. Jump up the registry tree a little to:

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]

Just like with 2005, use ENU if you are in the US, or your appropriate language abbreviation. Looking at the name “TsaurusFile”, it probably comes as no surprise to discover the name is “tsenu.xml”.

Now that we’ve located it, let’s navigate to the folder and open it up in your favorite text editor (which if you’ve been reading my past blog posts is ANYTHING other than Notepad). Let’s take a look:

<XML ID=”Microsoft Search Thesaurus”>

<!–  Commented out

    <thesaurus xmlns=”x-schema:tsSchema.xml”>

    <diacritics_sensitive>0</diacritics_sensitive>

        <expansion>

            <sub>Internet Explorer</sub>

            <sub>IE</sub>

            <sub>IE5</sub>

        </expansion>

        <replacement>

            <pat>NT5</pat>

            <pat>W2K</pat>

            <sub>Windows 2000</sub>

        </replacement>

        <expansion>

            <sub>run</sub>

            <sub>jog</sub>

        </expansion>

    </thesaurus>

–>

</XML>

Note there is one minor difference between the 2005 version, shown above, and the 2008 version. In 2008, the <diacritics_sensitive> line is changed to:

       <diacritics = false/>

In either case, you won’t need to adjust this line. One other note, if you are editing this in Vista, be sure to launch your text editor in Run As Administrator mode so you can save the changes.

The first thing to notice is the file is commented out. We’ll want to uncomment as the first step. Next you’ll see the <thesaurus…> section. We won’t need to change it.

The next three sections are examples. Ultimately you’ll delete and replace with your own, but let’s take a moment to look at what is there. The first section is an <expansion>…</expansion> tag. With an expansion, all terms are equivalent. With the expansion tags, if the user enters any one of those terms, it’s the same as if they’d entered all of the terms. Thus in the first example, if a user were to type in “Internet Explorer”, a full text search would return all records that contained “Internet Explorer”, “IE”, or “IE5”.

Replacements are something you’ll use less often. With Replacements, SQL Server does not look for the word in the <pat> (pattern) tag, instead it looks for the word in the <sub> tag. In this case if a user types in W2K, the full text search engine will instead look for “Windows 2000”. The only true life situation I can think of where this would be useful is addresses. If you know your system converts all street or state abbreviations to their full expanded name, then you could use this as a trap of sort with full text searching. For example:

<replacement>

  <pat>St.</pat>

  <pat>Str.</pat>

  <sub>Street</sub>

</replacement>

Thus a user typing in Elm St. would be able to find Elm Street in your system. You could also expand this with states. Let me reiterate this example assumes your source system automatically replaces all abbreviations with their full words. Using a replacement saves the full text search engine for looking for words that won’t be there. Of the two, expansions will probably be the one you use most of the time.

So let’s create a simple example to test out our custom thesaurus. Open up your tsenu.xml file, and let’s change it to:

<XML ID=”Microsoft Search Thesaurus”>

    <thesaurus xmlns=”x-schema:tsSchema.xml”>

            <diacritics_sensitive>0</diacritics_sensitive>

        <expansion>

            <sub>light</sub>

            <sub>TheDoctor</sub>

        </expansion>

    </thesaurus>

</XML>

(Again, in 2008 the diacritics line will be slightly different, just use what 2008 put out there for you.) Here I’m going to make the word light and the word “TheDoctor” substitutable for each other. (I’m listening to the Doctor Who soundtrack as I type this, in case you were wondering. I wanted a word I knew I wouldn’t find in the Adventure Works database.) Save the changes.

Unfortunately just saving the changes is not enough to make the full text search engine pick up the changes to our thesaurus file. We have to tell SQL Server the file has been updated. The procedure differs greatly, depending on whether you run SQL Server 2005 or 2008.

In SQL Server 2005, you unfortunately must restart the entire full text search service. In a production environment, you never want to do this when the engine is busy or in use. I have a recent blog post that talks about the SQL Server Full Text Catalog Property function. In it, I have this handy query for you to be able to tell what the full text service is doing.

select case FullTextCatalogProperty(‘AdvWorksCatalog’, ‘PopulateStatus’)

          when 0 then ‘Idle’

          when 1 then ‘Full population in progress’

          when 2 then ‘Paused’

          when 3 then ‘Throttled’

          when 4 then ‘Recovering’

          when 5 then ‘Shutdown’

          when 6 then ‘Incremental population in progress’

          when 7 then ‘Building index’

          when 8 then ‘Disk is full. Paused.’

          when 9 then ‘Change tracking’

          else ‘Error reading FullTextCatalogProperty PopulateStatus’

        end

As long as it returns “Idle” you are clear to reset. Even so, I would make sure to do this during a database outage, especially with a production database. You’ve been warned!

To reset, go to Start, Control Panel, Administrative Tools, Services. Scroll down to SQL Server Full Text Search (MSSQLSERVER), right click and Restart. That should get it for SQL Server 2005.

In SQL Server 2008 it’s a good deal simpler. Make sure you are in the right database for your catalog (in this case AdventureWorks) and issue the command:

exec sys.sp_fulltext_load_thesaurus_file 1033;

go

The 1033 on the end refers to the local identifier (LCID) for the language of your thesaurus file. 1033 is for English, US. To discover your LCID, use this simple query:

select [name], alias, lcid from master.sys.syslanguages

OK, we’re at the finish line. Assuming your full text engine has restarted, it should have picked up your new, customized thesaurus file. Now we can alter our original query to:

select [Name], ProductNumber, [Description]

from [Production].[Product] p

   , [Production].[ProductDescription] pd

   , [Production].[ProductModelProductDescriptionCulture] pmpdc

where p.ProductModelID = pmpdc.ProductModelID

  and pmpdc.ProductDescriptionID = pd.ProductDescriptionID

  and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, TheDoctor)’ )

And we should get back the same results as the original query.

Please note that all remarks for SQL Server 2008 are based upon the February 2008 CTP 6 release. As future release candidates and final editions are released, some of the information could change. Be sure to check books on line for the latest info.

A final thing to note, if you read yesterday’s post you will note that the FullText keyword uses the thesaurus as part of it’s operation. If you have a custom thesaurus, it’s entries will also be used by FullText.

As you can see, adding custom entries to the SQL Server Full Text Search Thesaurus is not hard, there’s just some steps you need to follow in order to make it happen. Once you know how, you can use the functionality to make searches for your users more productive.

About these ads

8 Responses to “Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search”

  1. Meir Says:

    Great post.

    I was wondering if it is possible to customize SQL Server’s inflection lists (also known as stemming)?

    Thanks in advance.

  2. Mike Says:

    Hi,
    this description was very useful, short and with the accurate information.
    happy second advent!
    Mike

  3. SQL Server Full Text Searching at the Atlanta Code Camp « Arcane Code Says:

    [...] Function Using the ObjectPropertyEx Function Using FORMSOF in SQL Server Full Text Searching Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search Creating and Customizing Noise Words in SQL Server 2005 Full Text Search Creating and Customizing [...]

  4. Josh Says:

    Thanks, that was very handy info…..

  5. sridhar Says:

    This is really helpful.

  6. Amit Says:

    Thank you so much! You are amazing.

  7. cargamestus Says:

    Want to enjoy [http://www.cargames55.com/ car games – ? If the way to go is yes, which is suggest we’ve got the identical curiosity. Sure I feel a big lover of vehicle game titles. As i am fan of auto game titles on-line, I made a decision to build a internet site that will completely helping auto video games only, sure the idea 100% linked to car games. The web page tackle will be cargames55.com . Every one of the online games had been personally selected for their quality simply video games which secure for children will be published.

    Not like an additional gaming website that will encourage their internet site using auto video games as his or her search term, however nearly all of their web-site’s content is not really pertinent using the keyword they pick. Enable say they are ahead Search consequence using search term auto game titles, but in fact their particular site’s articles in connection with auto game titles simply less and then 10% of their overall games. Yes these people combined their particular game titles series even if the video games risk-free pertaining to child or otherwise not, they don’t value, they merely use it in the same position without having concern about grow older restriction.

    By participate in [http://www.cargames55.com/ car games – on cargames55.com, you won’t find any kind of game titles that’s not related to car. These were effectively categorized to Some classes. Sometimes you may feel requirement for rate? Simply choose the automobile race video games group. Sense your adrenaline hurry any time sporting around the fast track signal, price highway or even around the leave similar to move, system competition, NASCAR ethnic background, block race and more. Want to test out your driving a car talent? Simply choose the automobile traveling video games category. Test out your traveling talent controlling car or truck say for example a Sport utility vehicle, sedan, pickup as well as coach while travelling, gangway and other place. Want to teach how to park cars? Merely select the [http://www.cargames55.com/ car games – class.
    Try to preventing a vehicle along with leaving behind the idea unoccupied for more than a simple moment. Car parking on one or each party of your road is frequently allowed, although often using constraints. Educate how to recreation area invert or concurrent with out reach just about any physical objects close to your automobile. Have you been a new focusing mania? Just pick the automobile adjusting games class. In simple terms stage system pimp my experience game titles. Your online games are only for personalize as well as modify any parts of your automobile regarding seems to be, efficiency along with dealing with. Melody 325i repairs, Mercedes, Audi, Lamborghini and lots of other awesome cars. Customize the exhaust, tires, wheels, headgear, rims, front lights, physique products, fresh paint, motor and then any additional automobile pieces.

  8. responsive Woocommerce theme Says:

    I think the admin of this web page is really working hard
    in support of his website, for the reason that here
    every stuff is quality based information.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 102 other followers

%d bloggers like this: