Full Text Searching a FILESTREAM VARBINARY (MAX) Column

In the past I’ve written that Full Text Searching has the ability to index documents stored in a VARBINARY(MAX) field. However, I have never really gone into any details on how to do this. Today I will remedy that by demonstrating how to Full Text Seach not only using a VARBINARY(MAX) field, but one that has been stored using FILESTREAM. Even though these examples will be done against the data we’ve stored with FILESTREAM over the lessons from the last few days, know that this technique is identical for binary objects stored in a VARBINARY(MAX) field without using FILESTREAM.

Let’s start by creating a catalog to hold our Full Text data.


CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT;

Pretty normal, now we need to create a full text index on the “DocumentRepository” table we created in this series. When you look at the syntax though, you may notice a minor difference from the CREATE FULLTEXT INDEX examples I’ve shown in the past:


CREATE FULLTEXT INDEX ON dbo.DocumentRepository
(DocumentName, Document TYPE COLUMN DocumentExtension)
KEY INDEX PK__Document__3214EC277F60ED59
ON FileStreamFTSCatalog
WITH CHANGE_TRACKING AUTO;

Here you can see I am indexing two fields. The first is the “DocumentName”, which is passed in as the first parameter and looks like other examples. We won’t actually be using it in this example, however I included it to demonstrate you can index multiple columns even when one of them is a VARBINARY(MAX) column.

The second parameter indexes the VARBINARY(MAX) “Document” column itself, but notice the TYPE COLUMN after the column name. In order to Full Text Index a VARBINARY(MAX) column you must also have a column with the file extension in it. You then pass in the name of column after the TYPE COLUMN. In this example, the document extension is stored in the “DocumentExtension” column. Since the document extension can be stored in a column with any name, we let the Full Text engine know which column by passing it in after the TYPE COLUMN keyword. The remainder of the command is like other examples I’ve shown in the past.

Now we can run a normal SELECT…CONTAINS query against the “Document” field.


SELECT ID, DocumentName 
FROM dbo.DocumentRepository
WHERE CONTAINS(Document, 'Shrew');

I’ll leave it to you to run, for me it returned one row, with “TheTamingOfTheShrew.doc”. If you want to try it again, use “Elinor”, and you should get back “KingJohn.doc”.

As you can see, performing a Full Text Search against a VARBINARY(MAX) column is quite easy, all you have to do is indicate the document type by using the TYPE COLUMN. There are two more things you should know. First, the column containing the document extension must be of type CHAR, NCHAR, VARCHAR, or NVARCHAR. Second, the document type must be recognized by SQL Server. To get a list of all valid document types, simply query the fulltext_document_types catalog view like so:


SELECT * FROM sys.fulltext_document_types;

This will give you a list of all file extensions understood by SQL Server. Each row actually represents a filter. Each filter represents a DLL that implements the IFilter interface. It is possible to add additional filters to the system. For example, Microsoft offers the “Microsoft Filter Pack”. You may have noticed that out of the box SQL Server 2008 supports the older Office 2003 documents, but not the more recent Office 2007 formats. To add these newer formats to your SQL Server, Microsoft provides the afore mentioned filter pack. While installing it is beyond the scope of this aritcle you can find complete instructions for downloand and installation at http://support.microsoft.com/default.aspx?scid=kb;en-us;945934 .

The Full Text Search features provided by SQL Server continue to amaze me with how powerful they are, yet how easy they are to implment. With the information here you can easily search through documents stored in a VARBINARY(MAX) field, even when those documents are actually stored via the new SQL Server 2008 FILESTREAM.

7 thoughts on “Full Text Searching a FILESTREAM VARBINARY (MAX) Column

  1. It’s a long time in the future now but I’d just like to say thank you for these articles about FILESTREAM and full text search. They’ve worked brilliantly and save me hours of effort. So, thank you 🙂

  2. CREATE FULLTEXT INDEX ON dbo.DocumentRepository
    (DocumentName, Document TYPE COLUMN DocumentExtension)
    KEY INDEX PK__Document__3214EC277F60ED59
    ON FileStreamFTSCatalog
    WITH CHANGE_TRACKING AUTO;

    getting the following error while executing the above script

    Msg 7653, Level 16, State 1, Line 3
    ‘PK__Document__3214EC277F60ED59’ is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, does not have a filter, and has maximum size of 900 bytes. Choose another index for the full-text key.

  3. Awesome Article!

    @Ramesh – Use sp_help DocumentRepository to identify the clustered index name & then copy-paste it in the “KEY INDEX PK__Document__3214EC277F60ED59” line.

    You should be good to go!

  4. “Now we can run a normal SELECT…CONTAINS query against the “Document” field.

    SELECT ID, DocumentName
    FROM dbo.DocumentRepository
    WHERE CONTAINS(Document, ‘Shrew’);”

    Can’t we run CONTAINS query against “CONTENT” field here? What I want to know is that since with filestream files are saved in file system, while its reference is saved in DB, so is it possible to have a search on that data using SQL queries?

  5. Pingback: keyword searching

Leave a comment