It’s here it’s here! SQL Server 2008 RTM is now on MSDN. I’m downloading it now. Only downside, to use Visual Studio 2008 with it you need VS2008 SP1, and that hasn’t released yet. So be cautious if you install and desire to use VS2008 with it.
Category: SQL Server
Arcane Review: Expert SQL Server 2005 Integration Services
If you recall my “Good Reads” post from June 25th, you will remember I am a big believer in books as a learning medium. I like to employ a lot of different ways to learn: user groups, blogs, podcasts, videocasts, and magazines to name a few. But for really in depth coverage, it’s hard to beat a nice book in your hands. I got some good feedback from my mention last week of Andy Leonard’s new e-book on Data Dude, so I thought that I would continue by adding book reviews to the blog every so often.
For this review I thought I’d cover a book that seems to constantly be on my desk lately: Expert SQL Server 2005 Integration Services, by Brian Knight and Eric Veerman. This book does a really good job and is specifically targeted toward the data warehousing professional. One entire chapter is devoted to ETL for dimension tables; another chapter focuses on the fact tables. It was great to have coverage so focused on these topics.
Another favorite part of the book is the two chapters on deploying and managing SSIS packages. So often these topics are glossed over, especially the managing piece. The book does a great job in covering all the tools and practices around this subject. I’ll mention one more chapter, one that focuses on package reliability. They cover logging, auditing, event handling, checkpoint files, and even suggestions on testing error handling logic.
There are many more chapters in the book, such as migration from DTS (SQL Server 2000) and Scalability, for you to discover. The other thing I love about this book is the brevity. The authors cover an amazing amount of information in just 382 pages. As a busy, busy person I very much appreciate the conciseness they achieved without sacrificing any clarity.
I’ve met both authors, and have heard them speak. They are both very nice, knowledgeable individuals, and I highly encourage you to attend one of their presentations if you get the chance, or if not at least buy their book from your favorite retailer; you will find it a great investment.
SQL Heroes and Data Dudes
In case you are wondering where I’ve been lately, it’s been a combination of issues at work plus putting the final nails in the schedule for TechMixer University. It’s been a big team effort, and as chair of the education committee I’ve had some great help gathering speakers for the event. We just published the first round of speakers and tracks so be sure to go take a look. And if you haven’t signed up yet, you better hurry registration is rapidly filling up!
On the interesting SQL News front, thanks to Jason over at StatisticsIO I just found out about the SQL Heroes site. They are having a contest for the best SQL Server 2008 community project submitted to CodePlex. Here’s a listing of the current SQL projects currently there. Looks like some useful utilities, and the contest still has a month left in it so plenty of time to assemble your own project and submit to the contest.
I recently began diving deeper into Visual Studio Team System 2008 Database Edition (aka “Data Dude”) and wanted to learn more about it. Fellow SQL MVP Andy Leonard has come out with a new book, the first in a series on the subject. Volume 1, available from Solid Quality Mentors, takes you from the basics of creating your first database project, into versioning your database and scripts, right into building and deploying. There’s also some interesting and useful material in the appendixes, including installation of data dude and importing a database schema. I was also interested in getting a peak at his development environment. And at a mere 15 dollars I thought it was a bargain.
The SQL Server Developer
This is a subject I’ve been thinking about for quite a while; perhaps others are drawing similar conclusions. I may even be late to the game, but if so I haven’t seen it discussed on the blogs or podcasts, and I keep up with these pretty regularly. After a lot of consideration, I’ve decided there is a new type of IT professional, the SQL Server Developer, of which I consider myself one.
Let’s start out with a basic definition. What is a SQL Server Developer? In my mind they fall into two categories. The first is the developer who works with the SQL Server Business Intelligence (SSBI) tools, namely SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), or SQL Server Analysis Services (SSAS). The second is the type of developer who works in the server end, developing stored procedures in both T-SQL and CLR, scripts, designing tables and views, and other tasks not centered on the day to day activities around the actual running of the Server itself. In many organizations these two areas are covered by the same person.
So what has caused this new breed of IT professional to emerge? Two reasons as I see it. First is the introduction of SQL Server 2005 itself. It brought along a new flood of tools, many outside the experiences of the typical DBA. The ability to write CLR inside the database is very new to DBAs, most of whom have no experience with .Net coding. Note this is in no way any sort of knock against DBAs, I would not expect one to have any experience with it. Likewise with many of the other tools.
The bigger reason though is Sarbanes Oxley. For a complete background see the Wikipedia article on Sarbanes Oxley, but in brief “SOX” is a US law that makes the leaders of publically held companies accountable for the financial dealings in their company. Auditors are responsible for ensuring compliance. As a result, most corporations have put in place rules in IT that place a wall between production systems and the developers who created those systems. In my own company’s environment, and those of many others I speak with, this means the DBAs are no longer allowed to develop code. No table designs, to stored procedures, etc. They are able to develop scripts if they are used in maintaining the health of the server; those are OK because financial decisions are not being made based on those scripts.
Somebody then, had to step in and fill the gap. In many companies since these were considered development tasks the coding fell to the development group. In other organizations DBAs were divided into production DBAs and development DBAs. In either case these folks are responsible for developing solutions to business issues, and are not responsible (at least not directly) for the day to day running of the server.
Now that you understand what a SQL Server Developer is and why they came into existence, you may be asking what the point of this article is? Well, I suppose it’s a plea of sorts. I see a lot of activities / training for both the DBA and the .Net pro, but little for the SS Dev. Even Tech-Ed this year demonstrated the schizophrenia when it split the event in two. There were just as many events in the Dev week as there were in the IT Pro week that applied to the SS Dev. Don’t get me wrong, I have seen training videos, mostly from Microsoft, that cover the technologies involved. But little that talk about the overall experiences that a SS Dev. In addition, almost every book I read assumes the reader comes from a DBA background. Doing so only covers half of the target audience; keep in mind there’s a lot of us who came from a .Net background.
So what would I like to see? Well to begin with, books that don’t assume everyone has the same background. Next I’d like to see more events targeted at the SQL Server Developer. Here in Birmingham we’re planning on a SQL Saturday next spring, I’d like to see many sessions devoted to the SS Dev. Finally, there seems to be very little software, outside the tools that ship from Microsoft, to assist the SS Dev. RedGate has some nice tools, and I’ve just started investigating the ApexSQL tools, most tools seem to target the DBA primarily though. It’d be nice to see collections and offerings more targeted at development.
What can you do? Well if you recognize yourself as a SQL Server Developer, start referring to yourself as such. Talk to Microsoft and vendors, start bringing the gap to them, ask them to start providing tools and events to cover our needs. Finally, evangelize! Do presentations, blog, whatever it takes to let the world know there’s a new breed of IT Professional out there.
Interview with Andy Warren on his Birmingham Visit
After the SQL Saturday event Inbetween the TechEds I had a moment to interview SQL Server guru Andy Warren about his upcoming trip to Birmingham Alabama. Andy will be speaking at the Steel City SQL group on June 17th and holding an all day class on on Performance Tuning on June 18th. You can find my interview with him, in MP3 format, here:
Note the audio is raw and uncut, but will give you all of the information you need.
Presenting SQL Server 2005 2008 Full Text Searching at the TechEd 2008 InBetween conference
On June 7th, 2008 I am presenting “Getting Started with SQL Server 2005/2008” at the InBetween SQL Saturday conference that is taking place between the two weeks of Microsoft TechEd. This post will have all the links relevant to my talk.
Detailed descriptive material can be found on my blog, ArcaneCode.com, with links to specific posts below.
First off, the slides and sample code can be located at the Code Gallery site I setup specifically for Full Text Searching with SQL Server:
http://code.msdn.microsoft.com/SqlServerFTS
Look on the downloads page to see various projects around SQL Server Full Text Searching. I’ve created one “release” for each of the projects around FTS. Be sure to look on the right side at the various releases in order to see the various projects.
Next, you can get started with the basics by reading these entries on my blog:
Lesson 0 – Getting the Bits to do Full Text Searching in SQL Server 2005
Lesson 1 – The Catalog
Lesson 2 – The Indexes
Lesson 3 – Using SQL
Lesson 4 – Valid Data Types
Lesson 5 – Advanced Searching
After that you’ll be ready for some advanced topics.
Can you hear me now? Checking to see if FTS is installed.
Exploring SQL Servers FullTextCatalogProperty Function
Using the ObjectPropertyEx Function
Using FORMSOF in SQL Server Full Text Searching
Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search
Creating and Customizing Noise Words in SQL Server 2005 Full Text Search
Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search
Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists
Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists
Yesterday I covered the basics around StopWords/StopLists in SQL Server 2008. Today we’ll look at some advanced queries you can use to get more information back about FullText Indexes, StopWords, and StopLists.
The first query simply returns a list of all of the catalogs in the system.
— Get current list of full text catalogs
select [name] as CatalogName
, path
, is_default
from sys.fulltext_catalogs
order by [name];
The next query returns a list of all the StopLists.
— Get the list of StopLists
select stoplist_id
, name
from sys.fulltext_stoplists;
This query returns a list of StopWords in the database. Note the linking to get the associated StopList name and language.
— Get list of StopWords
select sl.name as StopListName
, sw.stopword as StopWord
, lg.alias as LanguageAlias
, lg.name as LanguageName
, lg.lcid as LanguageLCID
from sys.fulltext_stopwords sw
join sys.fulltext_stoplists sl
on sl.stoplist_id = sw.stoplist_id
join master.sys.syslanguages lg
on lg.lcid = sw.language_id;
This next query gets a list of all of the stopwords that ship with SQL Server 2008. This is a nice improvement, you can not do this in SQL Server 2005.
— Get a list of the System provided stopwords
select ssw.stopword
, slg.name
from sys.fulltext_system_stopwords ssw
join sys.fulltext_languages slg
on slg.lcid = ssw.language_id;
My next query returns a list of all the Full Text Indexes in the database.
— List full text indexes
select c.name as CatalogName
, t.name as TableName
, idx.name as UniqueIndexName
, case i.is_enabled
when 1 then ‘Enabled’
else ‘Not Enabled’
end as IsEnabled
, i.change_tracking_state_desc
, sl.name as StopListName
from sys.fulltext_indexes i
join sys.fulltext_catalogs c
on i.fulltext_catalog_id = c.fulltext_catalog_id
join sys.tables t
on i.object_id = t.object_id
join sys.indexes idx
on i.unique_index_id = idx.index_id
and i.object_id = idx.object_id
left join sys.fulltext_stoplists sl
on sl.stoplist_id = i.stoplist_id
This query returns a list of all the document types SQL Server 2008 understands when they are placed in a varbinary(max) field.
— List all of the document types SQL Server 2008 will understand in varbinary(max) field
select document_type
, path
, [version]
, manufacturer
from sys.fulltext_document_types;
If your full text performance begins to suffer over time, you might want to check and see how many fragments exist. If you have multiple closed fragments, you should consider doing a REORGANIZE on the index (using alter fulltext index). This query will tell you how many fragments exist for your full text index.
— See how many fragments exist for each full text index.
— If multiple closed fragments exist for a table do a REORGANIZE to help performance
select t.name as TableName
, f.data_size
, f.row_count
, case f.status
when 0 then ‘Newly created and not yet used’
when 1 then ‘Being used for insert’
when 4 then ‘Closed ready for query’
when 6 then ‘Being used for merge inpurt and ready for query’
when 8 then ‘Marked for deletion. Will not be used for query and merge source’
else ‘Unknown status code’
end
from sys.fulltext_index_fragments f
join sys.tables t on f.table_id = t.object_id;
There you go, a handful of powerful queries to help you query and maintain the state of your full text indexes.
Creating and Customizing Noise Words / StopWords in SQL Server 2008 Full Text Search
While SQL Server 2005 had Noise Words for its full text search, SQL Server 2008 has moved to StopWords. The good news is the change is much deeper than just a rebranding. I say good news, because with the change comes a lot more flexibility and functionality.
There are actually two new tools introduced with SQL Server 2008, StopWords and StopList. A StopList acts as a named container for a group of StopWords. You can then associate a StopList with one or more tables. This is a great enhancement over Noise Words, which applied to the entire server. Now you can associate a group of StopWords, in a StopList, with specific tables without affecting the rest of the tables on the database or server.
First, let’s run a query to demonstrate. I’ll be doing all my testing against the AdventureWorks2008 database. If you don’t have it, you can download it at the CodePlex site. I’ve also used the standard Create Catalog command to create a full text catalog named AdvWorksCatalog. I then put a full text index on the Production.ProductDescription field. That done, here is a query I’ve used for examples in the past.
–Demo 1 – Query the Index
use AdventureWorks2008;
select [Name], ProductNumber, [Description]
from [Production].[Product] p
, [Production].[ProductDescription] pd
, [Production].[ProductModelProductDescriptionCulture] pmpdc
where p.ProductModelID = pmpdc.ProductModelID
and pmpdc.ProductDescriptionID = pd.ProductDescriptionID
and CONTAINS(pd.[Description], ‘shifting’)
Now we need to create the stoplist. Here’s we’ll use the first of the new SQL Server 2008 commands, create fulltext stoplist.
— Demo 2 – Create Stoplist
use AdventureWorks2008;
create fulltext stoplist ArcanesStopList;
The StopList will act as a holder for a specific set of words that we want to ignore. We refer to that group of words by the name we gave it, ArcanesStopList. Now we need to add some words to the list. Here are two ways to do so, both use the alter fulltext stoplist command.
— Demo 3
— Add StopWord
use AdventureWorks2008;
alter fulltext stoplist ArcanesStopList
add ‘shifting’ language 1033;
alter fulltext stoplist ArcanesStopList
add ‘light’ language ‘English’;
The command is straight forward, use the alter fulltext stoplist, give it the name of the list you want to add a word to. Then comes the word add, followed by the word you want to add. Next you have to specify the language. There are two ways to specify the language, either by using the language ID (in my case 1033) or the name for the language. If you don’t know your language, just use this query to find it:
— Here’s how to find your language
select [name], alias, lcid from master.sys.syslanguages;
If you were to jump the gun and rerun the query from Demo 1, you’d think it would now ignore the word shifting since we just added it as a StopWord to our StopList. But there’s still one more step.
You need to attach your StopList to a table that has a full text index on it. This is a major improvement over 2005. SQL Server 2008 now allows you to get quite granular with the application of custom groups of words. You are limited to one StopList per table, however. One StopList can be applied to multiple tables. Here’s the code to associate our StopList with a table:
— Demo 4 – Add the StopList to the Full Text Index
use AdventureWorks2008;
alter fulltext index on [Production].[ProductDescription]
set stoplist ArcanesStopList;
The alter fulltext command has been modified to add a set stoplist set of keywords. All we need to do is specify the tablename and the StopList to associate with that table. Now go run the query from Demo 1. You should get back zero rows.
Congratulations! You’ve now associated your StopList with the full text index..
I’m sure however you don’t want to leave it this way, so let’s look at what it will take to clean up the mess. First, you can decide you no longer want the StopList associated with the full text index. Time to use the alter command again.
— Demo 4-2 – Remove the StopList to the Full Text Index
use AdventureWorks2008;
alter fulltext index on [Production].[ProductDescription]
set stoplist system;
Setting the stoplist to the keyword system will change from your custom stoplist to the system stoplist. You can also use the word off instead of system to turn off StopWords all together for the specified table. Now would be a good time to mention, if you want to use the standard system set of StopWords, instead using a custom set, use the system keyword, as you see above.
There may be times when you want to remove just a word or two from a StopList, but not disassociate the entire list. Its possible to easily remove individual words from the list.
— Demo 4-3 – Remove single word from Stoplist
use AdventureWorks2008;
alter fulltext stoplist ArcanesStopList
drop ‘shifting’ language 1033;
alter fulltext stoplist ArcanesStopList
drop ‘light’ language ‘English’;
The syntax for the drop is identical to the add, except of course for using the word drop instead of add.
Finally, you may want to drop the StopList all together. There’s a drop command for it as well.
— Demp 5-3 – Drop StopList
use AdventureWorks2008;
drop fulltext stoplist ArcanesStopList;
This covers the basic use of StopWords / StopLists in 2008. I believe the ability to associate a specific set of words with one or more specific tables will give your searches greater power and flexibility.
Creating and Customizing Noise Words in SQL Server 2005 Full Text Search
SQL Server 2005 Full Text Search is a powerful feature. It will allow you to search for words in your text fields and get results back in a timely manner. However, certain words are so common they get in the way of effective searching. Having these words indexed would be worthless in terms of effective searching. Words like a, an, the, or, and so on. SQL Server 2005 refers to these words as “noise words”.
The list of noise words that ships with SQL Server 2005 is fine for most users. However, there are times when it can be an advantage to add words to the list. For example, let’s say your company has a rule that it’s name must appear in the header of all internal Word documents. Further, let’s say that all of those Word documents are stored in a varbinary(max) field and full text indexed. Your company name would effectively become a noise word, searching for it would return every record in your table and thus be useless in terms of results. You would then want to add your company name to the list of noise words for SQL Server 2005 to ignore.
Or, let’s take the opposite example. There was a word in the default SQL Server 2005 noise word list that, when in all caps, was an acronym for a piece of equipment in our plants. Thus it was getting omitted from searches. We had to remove that word from the noise word list.
So, how do we change the list of noise words in SQL Server 2005? First we have to track it down. Open up regedit and navigate to this spot in the tree.
HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]
My instance name is MSSQL.1. Pick your language, since I live in the US, my language will be enu, short for English, US. Don’t confuse it with eng, which is for our friends in Great Britain. Now look for the name NoiseFile. The value for mine is:
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseenu.txt.
Use any text editor to open and look at the file, it’s a simple text file. Mine only had 129 words in it. Now simply add or remove the words you want (or don’t want) in the noise word list. When done, save the file. If you are using SQL Server 2005 Developer Edition on Vista, make sure to open the text file with admin privileges so you’ll be able to save your changes.
Now for the bad news. (Hey, you didn’t think it was going to be that easy did you?) In order for the noise word file to take effect you must repopulate the full text indexes. You can do the entire catalog at once, or one table at a time. To do the entire catalog, use the alter full text catalog command. For more info on this command, see my post on Getting Started with SQL Server 2005 Full Text Searching: Part 1 – The Catalog.
To change things on a table by table basis, you can use the alter full text index command. Again, rather than repeating myself I’ll refer you back to my post Getting Started with SQL Server 2005 Full Text Searching: Part 2 – The Indexes. Failure to rebuild your indexes will result in bad results. As new items are indexed, or old updated, the new noise word list will apply as SQL Server 2005 applies the full text index to the record. If old records are not updated, you’ll have records that should (or shouldn’t) be in the results and some unhappy users.
Bear in mind one important fact, changes to the noise word file apply to the entire SQL Server, and all the databases running on it. Thus if you include or exclude noise words to support one app, you are affecting every app that touches every full text indexed database on your server.
Finally, if you read my post yesterday on Thesaurus Entries in SQL Server 2005 and 2008, you may wonder why no mention of 2008 and noise words? In SQL Server 2008 noise words have been replaced with a concept called stopwords. I’ll be talking about stopwords in a post in the very near future.
Creating Custom Thesaurus Entries in SQL Server 2005 and 2008 Full Text Search
SQL Server Full Text Search has the ability to search not just for a word, but for various forms of a word based upon the built in thesaurus. To use this ability in your own full text searches, you simply use the FormsOf clause. Here’s an example using the AdventureWorks database, with a full text index on the product description table’s Description field.
select [Name], ProductNumber, [Description]
from [Production].[Product] p
, [Production].[ProductDescription] pd
, [Production].[ProductModelProductDescriptionCulture] pmpdc
where p.ProductModelID = pmpdc.ProductModelID
and pmpdc.ProductDescriptionID = pd.ProductDescriptionID
and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, light)’ )
One of the biggest questions I get asked is, how do we customize the Thesaurus file used by SQL Server? Today we’ll look at how to do just that.
The first step is tracking it down. SQL Server stores the location of the thesaurus in the registry. The location depends on whether you are in SQL Server 2005 or 2008. For SQL Server 2005, open up regedit or your favorite editor and navigate to:
HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]
In my case, my instance was named MSSQL.1. I live in the US, so my language is “enu”, which stands for English, US. Don’t get this confused with “eng”, that is for our friends in the United Kingdom. (Unless of course that’s where you live in which case do indeed pick eng). There are around 17 supported languages out of the box.
OK, once you’ve navigated to the right spot in the registry tree, look for the key named TsaurusFile. Mine is listed as being located at:
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\tsenu.xml
In SQL Server 2008 you have to go to two places. First you need to find the path. For that, again open up regedit and navigate to:
HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSQLServer
For me, the instance name was MSSQL10.MSSQLSERVER. The name you should look for is “FullTextDefaultPath”. The structure was very similar to the 2005 version:
C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\FTData\
Now to find the file name. Jump up the registry tree a little to:
HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]
Just like with 2005, use ENU if you are in the US, or your appropriate language abbreviation. Looking at the name “TsaurusFile”, it probably comes as no surprise to discover the name is “tsenu.xml”.
Now that we’ve located it, let’s navigate to the folder and open it up in your favorite text editor (which if you’ve been reading my past blog posts is ANYTHING other than Notepad). Let’s take a look:
<XML ID=”Microsoft Search Thesaurus”>
<!– Commented out
<thesaurus xmlns=”x-schema:tsSchema.xml”>
<diacritics_sensitive>0</diacritics_sensitive>
<expansion>
<sub>Internet Explorer</sub>
<sub>IE</sub>
<sub>IE5</sub>
</expansion>
<replacement>
<pat>NT5</pat>
<pat>W2K</pat>
<sub>Windows 2000</sub>
</replacement>
<expansion>
<sub>run</sub>
<sub>jog</sub>
</expansion>
</thesaurus>
–>
</XML>
Note there is one minor difference between the 2005 version, shown above, and the 2008 version. In 2008, the <diacritics_sensitive> line is changed to:
<diacritics = false/>
In either case, you won’t need to adjust this line. One other note, if you are editing this in Vista, be sure to launch your text editor in Run As Administrator mode so you can save the changes.
The first thing to notice is the file is commented out. We’ll want to uncomment as the first step. Next you’ll see the <thesaurus…> section. We won’t need to change it.
The next three sections are examples. Ultimately you’ll delete and replace with your own, but let’s take a moment to look at what is there. The first section is an <expansion>…</expansion> tag. With an expansion, all terms are equivalent. With the expansion tags, if the user enters any one of those terms, it’s the same as if they’d entered all of the terms. Thus in the first example, if a user were to type in “Internet Explorer”, a full text search would return all records that contained “Internet Explorer”, “IE”, or “IE5”.
Replacements are something you’ll use less often. With Replacements, SQL Server does not look for the word in the <pat> (pattern) tag, instead it looks for the word in the <sub> tag. In this case if a user types in W2K, the full text search engine will instead look for “Windows 2000”. The only true life situation I can think of where this would be useful is addresses. If you know your system converts all street or state abbreviations to their full expanded name, then you could use this as a trap of sort with full text searching. For example:
<replacement>
<pat>St.</pat>
<pat>Str.</pat>
<sub>Street</sub>
</replacement>
Thus a user typing in Elm St. would be able to find Elm Street in your system. You could also expand this with states. Let me reiterate this example assumes your source system automatically replaces all abbreviations with their full words. Using a replacement saves the full text search engine for looking for words that won’t be there. Of the two, expansions will probably be the one you use most of the time.
So let’s create a simple example to test out our custom thesaurus. Open up your tsenu.xml file, and let’s change it to:
<XML ID=”Microsoft Search Thesaurus”>
<thesaurus xmlns=”x-schema:tsSchema.xml”>
<diacritics_sensitive>0</diacritics_sensitive>
<expansion>
<sub>light</sub>
<sub>TheDoctor</sub>
</expansion>
</thesaurus>
</XML>
(Again, in 2008 the diacritics line will be slightly different, just use what 2008 put out there for you.) Here I’m going to make the word light and the word “TheDoctor” substitutable for each other. (I’m listening to the Doctor Who soundtrack as I type this, in case you were wondering. I wanted a word I knew I wouldn’t find in the Adventure Works database.) Save the changes.
Unfortunately just saving the changes is not enough to make the full text search engine pick up the changes to our thesaurus file. We have to tell SQL Server the file has been updated. The procedure differs greatly, depending on whether you run SQL Server 2005 or 2008.
In SQL Server 2005, you unfortunately must restart the entire full text search service. In a production environment, you never want to do this when the engine is busy or in use. I have a recent blog post that talks about the SQL Server Full Text Catalog Property function. In it, I have this handy query for you to be able to tell what the full text service is doing.
select case FullTextCatalogProperty(‘AdvWorksCatalog’, ‘PopulateStatus’)
when 0 then ‘Idle’
when 1 then ‘Full population in progress’
when 2 then ‘Paused’
when 3 then ‘Throttled’
when 4 then ‘Recovering’
when 5 then ‘Shutdown’
when 6 then ‘Incremental population in progress’
when 7 then ‘Building index’
when 8 then ‘Disk is full. Paused.’
when 9 then ‘Change tracking’
else ‘Error reading FullTextCatalogProperty PopulateStatus’
end
As long as it returns “Idle” you are clear to reset. Even so, I would make sure to do this during a database outage, especially with a production database. You’ve been warned!
To reset, go to Start, Control Panel, Administrative Tools, Services. Scroll down to SQL Server Full Text Search (MSSQLSERVER), right click and Restart. That should get it for SQL Server 2005.
In SQL Server 2008 it’s a good deal simpler. Make sure you are in the right database for your catalog (in this case AdventureWorks) and issue the command:
exec sys.sp_fulltext_load_thesaurus_file 1033;
go
The 1033 on the end refers to the local identifier (LCID) for the language of your thesaurus file. 1033 is for English, US. To discover your LCID, use this simple query:
select [name], alias, lcid from master.sys.syslanguages
OK, we’re at the finish line. Assuming your full text engine has restarted, it should have picked up your new, customized thesaurus file. Now we can alter our original query to:
select [Name], ProductNumber, [Description]
from [Production].[Product] p
, [Production].[ProductDescription] pd
, [Production].[ProductModelProductDescriptionCulture] pmpdc
where p.ProductModelID = pmpdc.ProductModelID
and pmpdc.ProductDescriptionID = pd.ProductDescriptionID
and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, TheDoctor)’ )
And we should get back the same results as the original query.
Please note that all remarks for SQL Server 2008 are based upon the February 2008 CTP 6 release. As future release candidates and final editions are released, some of the information could change. Be sure to check books on line for the latest info.
A final thing to note, if you read yesterday’s post you will note that the FullText keyword uses the thesaurus as part of it’s operation. If you have a custom thesaurus, it’s entries will also be used by FullText.
As you can see, adding custom entries to the SQL Server Full Text Search Thesaurus is not hard, there’s just some steps you need to follow in order to make it happen. Once you know how, you can use the functionality to make searches for your users more productive.
Using FormsOf in SQL Server Full Text Searching
In the past I’ve talked about some advanced text searching techniques for SQL Server Full Text Searching. Another you can take advantage of is FormsOf. Let’s say we’ve setup a full text search index on the Adventure Works ProductDescription tables’ Description field. (For more info on how to do this, go to the Arcane Lessons page of this site and scroll down to the “Getting Started With SQL Server 2005 Full Text Searching” section.)
FormsOf has two ways to use it, the Inflectional mode and the Thesaurus mode. Let’s look at an example, and then I’ll explain the differences. For this example, we’ll combine data from several tables to get back some meaningful information from the AdventureWorks database.
— Example 1 – FORMSOF INFLECTION
select [Name], ProductNumber, [Description]
from [Production].[Product] p
, [Production].[ProductDescription] pd
, [Production].[ProductModelProductDescriptionCulture] pmpdc
where p.ProductModelID = pmpdc.ProductModelID
and pmpdc.ProductDescriptionID = pd.ProductDescriptionID
and CONTAINS(pd.[Description], ‘FORMSOF(Inflectional, light)’ )
— Example 2 – FORMSOF THESAURUS
select [Name], ProductNumber, [Description]
from [Production].[Product] p
, [Production].[ProductDescription] pd
, [Production].[ProductModelProductDescriptionCulture] pmpdc
where p.ProductModelID = pmpdc.ProductModelID
and pmpdc.ProductDescriptionID = pd.ProductDescriptionID
and CONTAINS(pd.[Description], ‘FORMSOF(Thesaurus, light)’ )
Both examples use the same, I admit, rather bizarre syntax. Inside the string you pass to the CONTAINS clause, you put FORMSOF, with the word Inflectional or Thesaurus, a comma, then the word or phrase you want to search form. In this example I use Contains, but FormsOf also works with FreeText as well.
So what is the difference between Inflectional and Thesaurus? Inflectional finds all of the tenses of a word. For example, if you passed in Start, Inflectional will find Start, Started, and Starting. For nouns, Inflectional finds the single, plural, and possessive forms.
Thesaurus works like you would expect a thesaurus to. It will find variations of a word, Start, Begin, etc. Essentially, words that have the same meaning.
How do these relate to Contains and FreeText? Normally Contains looks for an exact match. FullText matches the meaning, but not the exact words in the query. FullText is very similar to Thesaurus, and even uses the Thesaurus in its work. FullText will break out the search string into its individual words, if there is more than one word. For each word, it generates the inflectional forms of the word, then identifies a list of matches for each word based on the thesaurus. FormsOf(Thesaurus… on the other hand just uses the thesaurus to do the search, without going through the inflectional step.
Using a combination of Contains, FreeText, and FormsOf you can give your users some real flexibility, ranging from exact matches to wide open searches.
SQL Server Migration Assistant
At work we have an Oracle based system we’ll be retiring sometime next year. We want to keep the data around for reporting, but don’t have the manpower or funds to properly flatten the database into a true data warehouse schema. We are leaning then to copy the data into our SQL warehouse database in almost a direct copy of the legacy’s schema with just a few minor tweaks.
In looking around for an efficient way to handle this, I found a tool called the SQL Server Migration Assistant for Oracle. This handy tool will copy everything, tables, views, stored procedures, triggers, just about everything you’d want.
In my case, all I really want is the table layouts. I will be making a minor tweak to the table layouts so I can combine four databases into a single one. In just a few hours, I was able to use SSMA to generate create table scripts for several hundred tables. I then brought the script into my favorite text editor, UltraEdit, and used it’s regular expression capability to add a source database field as the first field in every create table script. Saved ’em, ran it and in short order have a complete schema ready to hold my data.
Even using the SQL Server Migration Assistant in such a limited fashion, I was still able to save myself weeks of coding and trying to figure out how to manually map Oracle datatypes to SQL Server. If you are looking at any kind of data conversion take a look at this tool. You can find more info on the SSMA site at:
http://www.microsoft.com/sql/solutions/migration/oracle/default.mspx
SQL Server 2005 Reporting Services Screen Flicker / Redraw / Refresh Issue
We’ve had an annoying problem with a couple of our SQL Server 2005 Reporting Services (SSRS) reports lately. Two of our reports have a lot of parameters, over twenty each. When a user picked a value from a drop down to use for one of the many parameters we have, the entire report would flicker and slowly repaint. Very annoying and time consuming. Further it didn’t seem to matter if the report ran in the report manager, or hosted in a web page report viewer control.
We worked with this, and after several searches I finally found one article on the web which told us what our problem was.
http://sqlreportingservices.spaces.live.com/blog/cns!17F39A64B3FF4C5C!238.entry
The article does a good job of explaining the issue and how to fix it; I wanted to rebroadcast it to help bring attention to it. The summary version: if you have hard coded default values or expressions for the parameters for some reason SSRS will sometimes flicker when the user selects something from the list.
Instead, the answer is to use a query to supply the default value. Fortunately SSRS makes it easy, just create a dataset with something like:
SELECT ‘Default Value’ as TheDefaultValue
When we took our report and replaced all the hard coded defaults (and we had a lot) with queries to supply the defaults our flicker / redraw / repaint issue went away.
Again, read the linked post for a thorough understanding. My goal here is to spread the word; it took me a lot of search and wading through pages of posts to find ONE that had my answer. I thought if I had that much problem finding it others may too.
SQL Server Full Text Search – The ObjectPropertyEx Function
It can be quite useful to have a query that will list all of the tables in your database that currently hold full text indexes. Fortunately there is a function we can use, ObjectPropertyEx. We’ll combine it with a system view named sys.tables which gives us a list of all the tables in the current database.
Note, for todays example I will continue to use the AdventureWorks database, AdvWorksCatalog, and index I created in yesterday’s post on the FullTextCatalogProperty function.
ObjectPropertyEx takes two parameters. The first is the object ID of the table we want to work with. That’s why using sys.tables view is so nice, it already containts both the name of the table and it’s object ID. The second parameter is the name of the property we want to get, in this case TableHasActiveFulltextIndex will tell us if the table has an FTS (Full Text Search) index or not. Here’s a query that will list all of the tables with a full text index:
select t.[Name] as TableName
, ObjectPropertyEx(t.[object_id], ‘TableHasActiveFulltextIndex’) as IsFullTextIndexed
from sys.tables t
where ObjectPropertyEx(t.[object_id], ‘TableHasActiveFulltextIndex’) = 1
order by t.[Name]
TableName IsFullTextIndexed
——————– ——————
ProductDescription 1
This returns a list of the table name, and the value of 1 indicating the table does have a full text index on it. To get a list of all tables, simply omit the where clause. Tables without an full text index will have a 0 for the IsFullTextIndexed value.
We can further extend this function by using a second property, TableFulltextItemCount. This will tell us how many rows exist in the full text index for this table.
select t.[Name] as TableName
, ObjectPropertyEx(t.[object_id], ‘TableFulltextItemCount’) as NumberOfRows
from sys.tables t
where ObjectPropertyEx(t.[object_id], ‘TableHasActiveFulltextIndex’) = 1
order by t.[Name]
TableName NumberOfRows
——————- ————-
ProductDescription 762
In this case the number of rows matches the number of rows in the table.
This is because I have change tracking set to auto, and have a speedy system. Be aware there are times when this number won’t match the row count. This would primarily be when you have change tracking set to manual, have inserted new rows in to the source and have not issued the command to force the update of the full text index.
By comparing the TableFulltextItemCount to the number of actual rows in the source table, you can monitor your system health and activity. If you see the numbers grow ouside the normal boundaries, it can be an indication something is wrong. You can also use it for adjusting the frequency of manual update commands. If the numbers grow apart too much for your comfort, you may wish to make the updates occur more frequently. Conversely, if most of the time the numbers match you may wish to reduce the frequency of manual updates.
It’s not absolutely required to go to the sys.tables in order to get the count. If you already have the name of the table, you can use it in combination with the object_id function to get the count for that one table.
select ‘Production.ProductDescription’ as TableName
, ObjectPropertyEx(object_id(‘Production.ProductDescription’)
, ‘TableFulltextItemCount’) as NumberOfRows
TableName NumberOfRows
——————- ————-
ProductDescription 762
Using the above logic you could easily create a function to pass in the name of a single table and get the number of rows in the full text index.
As you can see, using ObjectPropertyEx with its two simple full text search related properties will give you some useful functionality for monitoring your system.
SQL Server Full Text Search – The FullTextCatalogProperty Function
There is a useful function built into SQL Server for dealing with Full Text Search Catalogs: FullTextCatalogProperty . It takes two parameters, the first is the name of the catalog, the second is the name of the property you want. While there are quite a few properties in the list, most of them have been depreciated. However, there are still a handful that can provide valuable information.
For todays examples, I am using the AdventureWorks database. I’ve created a catalog named AdvWorksCatalog, using the following command:
create fulltext catalog AdvWorksCatalog as default;
I then created one index, using this command:
create fulltext index on Production.ProductDescription
([Description])
key index PK_ProductDescription_ProductDescriptionID
on AdvWorksCatalog
with change_tracking auto
For more info on these commands, see my original series on Full Text Search which began in June of last year. (See the Arcane Lessons page for links to all the lessons).
The first property we will look at is IndexSize. You can call it up with the following command:
select FullTextCatalogProperty(‘AdvWorksCatalog’, ‘IndexSize’)
This will return the size, in megabytes (MB) of the indexes in the catalog. Since we only have one in this example, it is still quite tiny and thus will return a 0 for the result.
Next, let’s say you’d like to monitor the growth of the catalog by monitoring the number of items. To help with that, SQL Server provides a property named ItemCount.
select FullTextCatalogProperty(‘AdvWorksCatalog’, ‘ItemCount’)
In my case, it returned a value of 762 items in the catalog. An item is the same thing as a row in the source table. In my example, a select count from the Production.ProductDescription table also has a count of 762. These match since I’ve only created a full text index on one table in the database. In your system it will be the grand total number of rows for all the tables you have created indexes for.
A similar statistic is the UniqueKeyCount property, produced with this command:
select FullTextCatalogProperty(‘AdvWorksCatalog’, ‘UniqueKeyCount’)
This will tell you how many unique words were found in the catalog. In the Adventure Works database I’m using for SQL Server 2005, the count was 3,195. This would be like doing a select distinct word on the full text catalog. Each word may appear multiple times (once for each row it was found in) but will only be counted once in the UniqueKeyCount total.
The final property that is useful is PopulateStatus. This returns a value from 0 to 9 indicating what the full text engine is doing right now.
select FullTextCatalogProperty(‘AdvWorksCatalog’, ‘PopulateStatus’)
Since I didn’t want to constantly reference the on-line help to decipher the value, I added a little case statement to my SQL and will let it do the work:
select case FullTextCatalogProperty(‘AdvWorksCatalog’, ‘PopulateStatus’)
when 0 then ‘Idle’
when 1 then ‘Full population in progress’
when 2 then ‘Paused’
when 3 then ‘Throttled’
when 4 then ‘Recovering’
when 5 then ‘Shutdown’
when 6 then ‘Incremental population in progress’
when 7 then ‘Building index’
when 8 then ‘Disk is full. Paused.’
when 9 then ‘Change tracking’
else ‘Error reading FullTextCatalogProperty PopulateStatus’
end
There, much more user friendly. I would envision this would be most useful when preparing to do upgrades to the server. You’d want to be sure all full text activity had ceased before doing a server reboot, or applying a lot of SQL Scripts that altered the full text index structures.
In all of these examples, be aware SQL Server will return a NULL if there is an error. For example, you pass in the name of a catalog that does not exist, or the property is incorrect.