Creating and Customizing Noise Words in SQL Server 2005 Full Text Search

SQL Server 2005 Full Text Search is a powerful feature. It will allow you to search for words in your text fields and get results back in a timely manner. However, certain words are so common they get in the way of effective searching. Having these words indexed would be worthless in terms of effective searching. Words like a, an, the, or, and so on. SQL Server 2005 refers to these words as “noise words”.

The list of noise words that ships with SQL Server 2005 is fine for most users. However, there are times when it can be an advantage to add words to the list. For example, let’s say your company has a rule that it’s name must appear in the header of all internal Word documents. Further, let’s say that all of those Word documents are stored in a varbinary(max) field and full text indexed. Your company name would effectively become a noise word, searching for it would return every record in your table and thus be useless in terms of results. You would then want to add your company name to the list of noise words for SQL Server 2005 to ignore.

Or, let’s take the opposite example. There was a word in the default SQL Server 2005 noise word list that, when in all caps, was an acronym for a piece of equipment in our plants. Thus it was getting omitted from searches. We had to remove that word from the noise word list.

So, how do we change the list of noise words in SQL Server 2005? First we have to track it down. Open up regedit and navigate to this spot in the tree.

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]

My instance name is MSSQL.1. Pick your language, since I live in the US, my language will be enu, short for English, US. Don’t confuse it with eng, which is for our friends in Great Britain. Now look for the name NoiseFile. The value for mine is:

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseenu.txt.

Use any text editor to open and look at the file, it’s a simple text file. Mine only had 129 words in it. Now simply add or remove the words you want (or don’t want) in the noise word list. When done, save the file. If you are using SQL Server 2005 Developer Edition on Vista, make sure to open the text file with admin privileges so you’ll be able to save your changes.

Now for the bad news. (Hey, you didn’t think it was going to be that easy did you?) In order for the noise word file to take effect you must repopulate the full text indexes. You can do the entire catalog at once, or one table at a time. To do the entire catalog, use the alter full text catalog command. For more info on this command, see my post on Getting Started with SQL Server 2005 Full Text Searching: Part 1 – The Catalog.

To change things on a table by table basis, you can use the alter full text index command. Again, rather than repeating myself I’ll refer you back to my post Getting Started with SQL Server 2005 Full Text Searching: Part 2 – The Indexes. Failure to rebuild your indexes will result in bad results. As new items are indexed, or old updated, the new noise word list will apply as SQL Server 2005 applies the full text index to the record. If old records are not updated, you’ll have records that should (or shouldn’t) be in the results and some unhappy users.

Bear in mind one important fact, changes to the noise word file apply to the entire SQL Server, and all the databases running on it. Thus if you include or exclude noise words to support one app, you are affecting every app that touches every full text indexed database on your server.

Finally, if you read my post yesterday on Thesaurus Entries in SQL Server 2005 and 2008, you may wonder why no mention of 2008 and noise words? In SQL Server 2008 noise words have been replaced with a concept called stopwords. I’ll be talking about stopwords in a post in the very near future.

Advertisements