SQL Server Full Text Searching at SQL Saturday #22 Pensacola FL

Today I had the opportunity to hang out with a bunch of cool people at SQL Saturday 22 in Pensacola FL. My presentation is SQL Server Full Text Searching, a Guide for Dev’s and DBAs. While I’ve done this presentation in the past, it’s been updated with new material and demos. You’ll find all the code, and the slide deck at the Code Gallery site,

http://code.msdn.microsoft.com/SqlServerFTS

Thanks for coming!

Alabama Code Camp Mobile 2010

Last Saturday was the Alabama Code Camp, held in Mobile AL. For those unfamiliar with the Alabama Code Camps, we hold on average two a year, and they shift from city to city with different user groups acting as the host group. Other cities include Huntsville, Birmingham, and Montgomery. This time though the Lower Alabama Dot Net User Group under the leadership of Ryan Duclos hosted, and what a great event it was. Everything ran smoothly, there was plenty of drinks and pizza to go around, and some good swag to boot. A big congrats to Ryan and his team of volunteers for a great event, also thanks to Microsoft for sponsoring and the University of South Alabama for the venue.

I was kept busy at this code camp, doing three sessions. The first session was “Introduction to Microsoft PowerPivot”. The slide deck can be found at http://arcanecode.files.wordpress.com/2010/01/powerpivot_long.pdf. To see all my PowerPivot posts, simply pick it in the categories to the right or use this link: http://arcanecode.com/category/powerpivot/.

My second session was on Full Text Searching. You can find code samples and the PDF for the presentation at my code gallery site, http://code.msdn.microsoft.com/SqlServerFTS.

The final presentation was an introduction to Business Intelligence and Data Warehousing. Here is the link to the presentations slides in PDF format. As promised in the session I added the additional information for the Kimball Group book.

A quick apology for my delay in posting, a nasty head cold has had me in Zombie land since I got back. Thanks to all who attended, I appreciate you being very interactive, lots of questions, and very attentive. I look forward to the next time Mobile hosts the Alabama Code Camp.

SQL Saturday Redmond – October 3 2009

I am fortunate enough to be able to give three presentations at Redmond WA’s SQL Saturday event. The first session is “Introduciton to Data Warehousing / Business Intelligence”. Here is the PDF slide deck for that presentation. (Right click and save as if you want to save a copy for later reference).

The second presentation is SQL Server Full Text Searching. You can find the slide deck in PDF format as well as sample code at http://code.msdn.microsoft.com/SqlServerFTS.

The final presentation of the day was Introduction to SQL Server Integration Services. The sample project, slide deck, and step by step instructions can be found at http://code.msdn.microsoft.com/introssis . In addition I also showed how to call SSIS from a .Net application. You can find that sample at http://code.msdn.microsoft.com/ssisfromnet .

Full Text Searching a FILESTREAM VARBINARY (MAX) Column

In the past I’ve written that Full Text Searching has the ability to index documents stored in a VARBINARY(MAX) field. However, I have never really gone into any details on how to do this. Today I will remedy that by demonstrating how to Full Text Seach not only using a VARBINARY(MAX) field, but one that has been stored using FILESTREAM. Even though these examples will be done against the data we’ve stored with FILESTREAM over the lessons from the last few days, know that this technique is identical for binary objects stored in a VARBINARY(MAX) field without using FILESTREAM.

Let’s start by creating a catalog to hold our Full Text data.


CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT;

Pretty normal, now we need to create a full text index on the “DocumentRepository” table we created in this series. When you look at the syntax though, you may notice a minor difference from the CREATE FULLTEXT INDEX examples I’ve shown in the past:


CREATE FULLTEXT INDEX ON dbo.DocumentRepository
(DocumentName, Document TYPE COLUMN DocumentExtension)
KEY INDEX PK__Document__3214EC277F60ED59
ON FileStreamFTSCatalog
WITH CHANGE_TRACKING AUTO;

Here you can see I am indexing two fields. The first is the “DocumentName”, which is passed in as the first parameter and looks like other examples. We won’t actually be using it in this example, however I included it to demonstrate you can index multiple columns even when one of them is a VARBINARY(MAX) column.

The second parameter indexes the VARBINARY(MAX) “Document” column itself, but notice the TYPE COLUMN after the column name. In order to Full Text Index a VARBINARY(MAX) column you must also have a column with the file extension in it. You then pass in the name of column after the TYPE COLUMN. In this example, the document extension is stored in the “DocumentExtension” column. Since the document extension can be stored in a column with any name, we let the Full Text engine know which column by passing it in after the TYPE COLUMN keyword. The remainder of the command is like other examples I’ve shown in the past.

Now we can run a normal SELECT…CONTAINS query against the “Document” field.


SELECT ID, DocumentName 
FROM dbo.DocumentRepository
WHERE CONTAINS(Document, 'Shrew');

I’ll leave it to you to run, for me it returned one row, with “TheTamingOfTheShrew.doc”. If you want to try it again, use “Elinor”, and you should get back “KingJohn.doc”.

As you can see, performing a Full Text Search against a VARBINARY(MAX) column is quite easy, all you have to do is indicate the document type by using the TYPE COLUMN. There are two more things you should know. First, the column containing the document extension must be of type CHAR, NCHAR, VARCHAR, or NVARCHAR. Second, the document type must be recognized by SQL Server. To get a list of all valid document types, simply query the fulltext_document_types catalog view like so:


SELECT * FROM sys.fulltext_document_types;

This will give you a list of all file extensions understood by SQL Server. Each row actually represents a filter. Each filter represents a DLL that implements the IFilter interface. It is possible to add additional filters to the system. For example, Microsoft offers the “Microsoft Filter Pack”. You may have noticed that out of the box SQL Server 2008 supports the older Office 2003 documents, but not the more recent Office 2007 formats. To add these newer formats to your SQL Server, Microsoft provides the afore mentioned filter pack. While installing it is beyond the scope of this aritcle you can find complete instructions for downloand and installation at http://support.microsoft.com/default.aspx?scid=kb;en-us;945934 .

The Full Text Search features provided by SQL Server continue to amaze me with how powerful they are, yet how easy they are to implment. With the information here you can easily search through documents stored in a VARBINARY(MAX) field, even when those documents are actually stored via the new SQL Server 2008 FILESTREAM.

Gentleman, JumpstartTV Your Engines

Thought I’d spread a little link love today, and to start with I will point you to the http://jumpstarttv.com website. JumpstartTV hosts short training videos with one very specific, focused topic per video. When I say short, I mean short. Three to five minutes is the goal for each video. I was honored recently when asked to participate in the site, and have created a series for them on SQL Server Full Text Searching. The first video on installing was featured yesterday, but you don’t have to wait for the videos to be featured, you can see all of them by jumping to my JumpstartTV profile.

One thing to note, you will be asked to create an online profile. This is free, and it turns out very useful. You can use it to track all of the videos you watched. This makes it very convenient to come back later and refresh yourself on something you learned. In addition, the site has a “watch it later” feature. You can go all over the site picking out videos you think would be interesting and clicking the “watch it later” link. Then when you go to your profile, you’ll be able watch the selected videos one after the other. JumpstartTV has videos on both SQL Server and .Net, as well as some interesting ones in the “Misc” category, including bartending, self defense, and more.

The second link for the day is an interesting article on the simple-talk website, “Taking Back Control of your IT Career”. It was written by a friend of mine, Stephan Onisick and chronicles his ordeal of getting laid off from his company of seven years, through a period of retraing himself and ultimately landing a new job that met the needs he set out. Even if your company is nice and stable, you will find good advice for keeping your skills up in this article. Disclaimer, he does mention a presentation I did in the article, but in spite of that it’s still a good read. ;-)

Next is a new SQL Server resource brought to us by the fine folks at Quest Software, it’s the new SQLServerPedia. The site is both a wiki and a series of podcast like videos you can subscribe to from your Zune or other music player. I have my Zune setup to automagically download new episodes as they come out. I believe it was @BrentO himself who clued me in on the site.

I’ve written in the past about CodeRush, the tool I refuse to code without. Well the wonderful folks at Devexpress have created a free version called CodeRush Xpress for Visual Studio. Now if you need to code on a budget, you can still enjoy CodeRushy goodness in your 2008 IDE! And it’s not even Christmas yet!

Many of you follow me on Twitter, if you don’t I’d love to invite you, I”m on as @arcanecode . Guy Kawasaki has a great article on How To Pick Up Followers on Twitter. Good article that shows some of the strengths of Twitter, and how to use them to everyone’s advantage.

Speaking of Twitter, thanks to @theronkelso I found a new service called TweetLater. This service lets you schedule a tweet to be delivered to Twitter at a later time. For example, I would like to be able to tweet that our BSDA meeting is about to begin. But as the current President I’m usually up front introducing the guest speaker, and thus not at a keyboard. TweetLater to the rescue, I can set it to auto post the meeting is starting and be in two places at once.

It’s also great as a reminder tool, I can queue up meeting reminder tweets for the entire year ahead of time and forget all about it. Another feature, you can set it to auto reply with a message to new followers, and it can even be setup to automatically follow anyone who is following you. I believe this is a resource I’ll be using a lot.

The next to final link is a reminder really, to the Alabama Tech Events site. This is a community site for posting technical events of interest to folks in the state of Alabama. Please note that the event doesn’t have to be in Alabama, just of reasonable interest to folks in the state. We’ve posted events in Tennesee, Mississippi, Florida and Georgia. If you have a technical event contact me or one of the other user group leaders to get it added.

I’ll wrap up today’s link lovefest with the site analogous to the Alabama Tech Event site, but for the entire country: Community Megaphone. This site lists events from all over the United States. You can filter by state or event type.

Presenting SQL Server 2005 2008 Full Text Searching at the TechEd 2008 InBetween conference

On June 7th, 2008 I am presenting “Getting Started with SQL Server 2005/2008″ at the InBetween SQL Saturday conference that is taking place between the two weeks of Microsoft TechEd. This post will have all the links relevant to my talk.

Detailed descriptive material can be found on my blog, ArcaneCode.com, with links to specific posts below.

First off, the slides and sample code can be located at the Code Gallery site I setup specifically for Full Text Searching with SQL Server:

http://code.msdn.microsoft.com/SqlServerFTS

Look on the downloads page to see various projects around SQL Server Full Text Searching. I’ve created one “release” for each of the projects around FTS. Be sure to look on the right side at the various releases in order to see the various projects.

Next, you can get started with the basics by reading these entries on my blog:

Lesson 0 – Getting the Bits to do Full Text Searching in SQL Server 2005
Lesson 1 – The Catalog
Lesson 2 – The Indexes
Lesson 3 – Using SQL
Lesson 4 – Valid Data Types
Lesson 5 – Advanced Searching

After that you’ll be ready for some advanced topics.

Can you hear me now? Checking to see if FTS is installed.
Exploring SQL Servers FullTextCatalogProperty Function
Using the ObjectPropertyEx Function
Using FORMSOF in SQL Server Full Text Searching
Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search
Creating and Customizing Noise Words in SQL Server 2005 Full Text Search
Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search
Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists

Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists

Yesterday I covered the basics around StopWords/StopLists in SQL Server 2008. Today we’ll look at some advanced queries you can use to get more information back about FullText Indexes, StopWords, and StopLists.

The first query simply returns a list of all of the catalogs in the system.

– Get current list of full text catalogs

select [name] as CatalogName

    , path

    , is_default

  from sys.fulltext_catalogs

 order by [name];

 

The next query returns a list of all the StopLists.

 – Get the list of StopLists

 select stoplist_id

      , name

   from sys.fulltext_stoplists;

 

This query returns a list of StopWords in the database. Note the linking to get the associated StopList name and language.

 – Get list of StopWords

 select sl.name as StopListName

      , sw.stopword as StopWord

      , lg.alias as LanguageAlias

      , lg.name  as LanguageName

      , lg.lcid  as LanguageLCID

   from sys.fulltext_stopwords sw

   join sys.fulltext_stoplists sl

    on sl.stoplist_id = sw.stoplist_id

   join master.sys.syslanguages lg

    on lg.lcid = sw.language_id;

 

This next query gets a list of all of the stopwords that ship with SQL Server 2008. This is a nice improvement, you can not do this in SQL Server 2005.

– Get a list of the System provided stopwords  

select ssw.stopword

    , slg.name

  from sys.fulltext_system_stopwords ssw

  join sys.fulltext_languages slg

    on slg.lcid = ssw.language_id;

 

My next query returns a list of all the Full Text Indexes in the database.

– List full text indexes

select c.name as CatalogName

    , t.name as TableName

    , idx.name as UniqueIndexName

    , case i.is_enabled

        when 1 then ‘Enabled’

        else ‘Not Enabled’

       end as IsEnabled

    , i.change_tracking_state_desc

    , sl.name as StopListName

  from sys.fulltext_indexes i

  join sys.fulltext_catalogs c

    on i.fulltext_catalog_id = c.fulltext_catalog_id

  join sys.tables t

    on i.object_id = t.object_id

  join sys.indexes idx

    on i.unique_index_id = idx.index_id

       and i.object_id = idx.object_id

  left join sys.fulltext_stoplists sl

    on sl.stoplist_id = i.stoplist_id

 

 

This query returns a list of all the document types SQL Server 2008 understands when they are placed in a varbinary(max) field.

– List all of the document types SQL Server 2008 will understand in varbinary(max) field

select document_type

    , path

    , [version]

    , manufacturer

  from sys.fulltext_document_types;

 

If your full text performance begins to suffer over time, you might want to check and see how many fragments exist. If you have multiple closed fragments, you should consider doing a REORGANIZE on the index (using alter fulltext index). This query will tell you how many fragments exist for your full text index.

– See how many fragments exist for each full text index.

– If multiple closed fragments exist for a table do a REORGANIZE to help performance

select t.name as TableName

    , f.data_size

    , f.row_count

    , case f.status

        when 0 then ‘Newly created and not yet used’

        when 1 then ‘Being used for insert’

        when 4 then ‘Closed ready for query’

        when 6 then ‘Being used for merge inpurt and ready for query’

        when 8 then ‘Marked for deletion. Will not be used for query and merge source’

        else ‘Unknown status code’

       end

  from sys.fulltext_index_fragments f

  join sys.tables t on f.table_id = t.object_id;

 

There you go, a handful of powerful queries to help you query and maintain the state of your full text indexes.

Follow

Get every new post delivered to your Inbox.

Join 103 other followers