Creating and Customizing Noise Words in SQL Server 2005 Full Text Search

SQL Server 2005 Full Text Search is a powerful feature. It will allow you to search for words in your text fields and get results back in a timely manner. However, certain words are so common they get in the way of effective searching. Having these words indexed would be worthless in terms of effective searching. Words like a, an, the, or, and so on. SQL Server 2005 refers to these words as “noise words”.

The list of noise words that ships with SQL Server 2005 is fine for most users. However, there are times when it can be an advantage to add words to the list. For example, let’s say your company has a rule that it’s name must appear in the header of all internal Word documents. Further, let’s say that all of those Word documents are stored in a varbinary(max) field and full text indexed. Your company name would effectively become a noise word, searching for it would return every record in your table and thus be useless in terms of results. You would then want to add your company name to the list of noise words for SQL Server 2005 to ignore.

Or, let’s take the opposite example. There was a word in the default SQL Server 2005 noise word list that, when in all caps, was an acronym for a piece of equipment in our plants. Thus it was getting omitted from searches. We had to remove that word from the noise word list.

So, how do we change the list of noise words in SQL Server 2005? First we have to track it down. Open up regedit and navigate to this spot in the tree.

HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]

My instance name is MSSQL.1. Pick your language, since I live in the US, my language will be enu, short for English, US. Don’t confuse it with eng, which is for our friends in Great Britain. Now look for the name NoiseFile. The value for mine is:

C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseenu.txt.

Use any text editor to open and look at the file, it’s a simple text file. Mine only had 129 words in it. Now simply add or remove the words you want (or don’t want) in the noise word list. When done, save the file. If you are using SQL Server 2005 Developer Edition on Vista, make sure to open the text file with admin privileges so you’ll be able to save your changes.

Now for the bad news. (Hey, you didn’t think it was going to be that easy did you?) In order for the noise word file to take effect you must repopulate the full text indexes. You can do the entire catalog at once, or one table at a time. To do the entire catalog, use the alter full text catalog command. For more info on this command, see my post on Getting Started with SQL Server 2005 Full Text Searching: Part 1 – The Catalog.

To change things on a table by table basis, you can use the alter full text index command. Again, rather than repeating myself I’ll refer you back to my post Getting Started with SQL Server 2005 Full Text Searching: Part 2 – The Indexes. Failure to rebuild your indexes will result in bad results. As new items are indexed, or old updated, the new noise word list will apply as SQL Server 2005 applies the full text index to the record. If old records are not updated, you’ll have records that should (or shouldn’t) be in the results and some unhappy users.

Bear in mind one important fact, changes to the noise word file apply to the entire SQL Server, and all the databases running on it. Thus if you include or exclude noise words to support one app, you are affecting every app that touches every full text indexed database on your server.

Finally, if you read my post yesterday on Thesaurus Entries in SQL Server 2005 and 2008, you may wonder why no mention of 2008 and noise words? In SQL Server 2008 noise words have been replaced with a concept called stopwords. I’ll be talking about stopwords in a post in the very near future.

About these ads

10 Responses to “Creating and Customizing Noise Words in SQL Server 2005 Full Text Search”

  1. Presenting SQL Server 2005 2008 Full Text Searching at TechMixer University « Arcane Code Says:

    [...] Full Text Searching Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search Creating and Customizing Noise Words in SQL Server 2005 Full Text Search Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search Advanced [...]

  2. SQL Server Full Text Searching at the Atlanta Code Camp « Arcane Code Says:

    [...] Full Text Searching Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search Creating and Customizing Noise Words in SQL Server 2005 Full Text Search Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search Advanced [...]

  3. Robert Says:

    Hi —

    I have a full-text indexed table, but I noticed that I don’t have the file noiseenu.txt anywhere in my SQL directory structure. I looked in %windir%/system32 and I found a NOISE.dat file.

    I’m sure noise words exist since i’m search my database the the abbreviation “ON”, but nothing comes up. I figure “on” is in the noise file.

    Can anyone suggest where I should look to find the noise file?

    I’m running MSSQL 2008 Express.

    Thanks

  4. RaiulBaztepo Says:

    Hello!
    Very Interesting post! Thank you for such interesting resource!
    PS: Sorry for my bad english, I’v just started to learn this language ;)
    See you!
    Your, Raiul Baztepo

  5. Robert Says:

    Found the solution. 2008 moved away from noise words; its now database based. See Robert Sheldon’s article on this. Its quite good.

    http://www.simple-talk.com/sql/learn-sql-server/understanding-full-text-indexing-in-sql-server/

  6. PiterKokoniz Says:

    Hi ! ^_^
    My name is Piter Kokoniz. Just want to tell, that your posts are really interesting
    And want to ask you: what was the reasson for you to start this blog?
    Sorry for my bad english:)
    Thank you!
    Your Piter

  7. Robert Says:

    Why not? :)

  8. Arcane Code Says:

    Piter,

    The main reason was I wanted to give back to the community. Over the years I have learned a lot from others and their blogs / websites, so I decided it was time for me to give back.

    I actually did an entire post on the benefits of blogging: http://arcanecode.com/2007/04/26/arcane-thoughts-benefits-of-blogging/ hopefully it will give you more insight on the many benefits I’ve found to blogging.

    Arcane

  9. Big Dan Says:

    Hi Arcane,

    I have a small problem I was wondering if you had encountered ?

    I need to know how to ignor certain characters from a full text search, so for example:-

    Search for ‘Greys’ would return ‘Greys’ AND ‘Grey’s’

    Or search for ‘xmen’ would return ‘x-men’ and vice versa.

    We have a cool stored proc that does some funky comparissons on rankings that I can put up if that helps?

    D.

  10. David Powel Says:

    Arcane, Great Stuff By the way ! Very good articles.
    I have sql server express 2005 sp2. So Its not possible to create noiseenu.txt files for different situations ? I work in an mro industrial environment. I was hoping to create a specific noise file for “Hand Tools” and “Safety Equipment” etc… and use a seperate noise file for each search ?
    Sounds as if the noise file Affects EVERY database I create in sql server expres 2005 ?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 102 other followers

%d bloggers like this: