Getting Started with SQL Server 2005 Full Text Searching: Part 5 – Advanced Searching

Last week we looked at how to search for data by passing a simple search word into a SQL Query that uses Full Text Searching. As you would expect, Full Text Searching is capable of much more powerful searches than a single word.

First, you can search for a phrase. To do so, you include the phrase inside double quotes.

SELECT WebSiteID, WebSiteName
FROM MyTableOfCoolWebsites
WHERE CONTAINS(*, ‘”Arcane Code”’)

Will find occurances of the phrase Arcane Code in your index. Let’s say however, you want to search for either Arcane or Code? FTS supports Boolean searches.

WHERE CONTAINS(*, ‘”Arcane” OR “Code”’)

Will return results where either the word Arcane or Code is in the text. If you want them both, use an AND in place of the OR. Additionally, FTS supports the AND NOT keyword, for when you want the first word but not the second.

Full Text Search also supports something called Proximity searches. In a proximity seach, one word appears near another.

WHERE CONTAINS(*, ‘arcane near code’)

So how near is near? Well the online docs are a bit vague on this, a safe guess is about ten words. You may also see this form of the syntax:

WHERE CONTAINS(*, ‘arcane ~ code’)

~ equates to near, although to me not nearly as readable, be aware though in case you run across it.

Finally, you should be aware that certain words are excluded from searches. Common words such as a, an, the, and so on. Microsoft refers to these as “Noise Words”. You can edit the list of noise words in case you have some words in your environment that wind up being Noise Words. Your company name might be one example.

I found the file in the folder C:\Program Files\Microsoft SQL Server\MSSQL.2\MSSQL\FTData (your milage may vary). The file is named noiseenu.txt. (ENU is for English US, not to be confused with noiseeng.txt which our friends in the British Isles will be using.)

This is also handy to know in case there is a reserverd word you need to remove from the list. In our environment one of the reserve words is also an abbreviation for a piece of our equipment, so I would want to remove this from our list.

Getting Started with SQL Server 2005 Full Text Searching: Part 4 – Valid Data Types

Hopefully everyone had some fun over the weekend with their digital cameras and listening to some good photo podcasts. Today we’ll return to the subject of Full Text Searching (FTS) under SQL Server 2005.

If you’re like me one of your questions is “OK, FTS is great, but exactly what data can I index with it?” Good question, a quck reference of the books online gives us the answer.

Char, nchar, varchar, nvarchar, text, ntext, xml, varbinary(max), and image are the valid data types for FTS. However, according to the documentation text, ntext, and image are going to be removed in a future version of SQL Server, so I’m going to avoid them and so should you.

Char, nchar, varchar, nvarchar are pretty straight forward, and probably expected, so I won’t go into them further. XML is pretty cool, as you may be aware with SQL Server 2005 Microsoft added the ability to store XML in the database. You can then do XML queries into these XML fields, a subject for a future post. However, MS also allows you to search inside these XML fields using Full Text Searching as well.

The final one that needs explanation then is varbinary(max). I’m sure you’re scratching you’re head over this, how can FTS work on binary data? Well remember, the Full Text Search engine is built on top of the Microsoft Indexing Service, which can scan inside Word, Excel, etc files stores on a server.

Thus FTS can actually look inside files stored inside varbinary(max) and if they have the proper extension FTS will look inside them and index the contents of the files. How sweet is that?