I am having problems doing a "Full Text Search" on the MSSQL2005 database
server for some search terms (those that have a dot or ampersand).
There are no problems doing FTS search with "normal" words.
We're using Microsoft SQL Server Standard Edition, version 9.00.2047.00
Build 3790: Service Pack 1.
The database and tables were created from scratch (not migrated).
Example terms which do not bring back any results...
F&C
P&O
S&P
E.On
Here is an example query:
SELECT TOP 10 *
FROM Content
WHERE where CONTAINS(text_content,'F&C')
Tried escaping the term, putting quotes and replacing the ampersand with &
When doing the search with LIKE I get the results
SELECT TOP 10 *
FROM Content
WHERE where text_content LIKE '%F&C%'
Thanks
Bimal
Bimal wrote on Fri, 2 Mar 2007 00:57:03 -0800:
> I am having problems doing a "Full Text Search" on the MSSQL2005 database
> server for some search terms (those that have a dot or ampersand).
> There are no problems doing FTS search with "normal" words.
> We're using Microsoft SQL Server Standard Edition, version 9.00.2047.00
> Build 3790: Service Pack 1.
> The database and tables were created from scratch (not migrated).
> Example terms which do not bring back any results...
> F&C
> P&O
> S&P
> E.On
> Here is an example query:
> SELECT TOP 10 *
> FROM Content
> WHERE where CONTAINS(text_content,'F&C')
> Tried escaping the term, putting quotes and replacing the ampersand with
> &
> When doing the search with LIKE I get the results
> SELECT TOP 10 *
> FROM Content
> WHERE where text_content LIKE '%F&C%'
When FTS indexes your column, F&C will be broken up to two words - 'F', and
'C'. By default the noise word file contains all single letters, so these
are ignored in the indexing. That's why you won't find them.
There are some workarounds. Edit the noise word file for the language you
are using, and leave just a single space in it. Then run a full population -
this will allow FTS to index everything. You can then use the phrasing
searches to help find the entries, but it still may not always be exact, eg.
SELECT TOP 10 *
FROM Content
WHERE where CONTAINS(text_content,'"F&C"')
would find all rows where text_content has the following:
F&C
F.C
F C
etc.
The only way to get an exact match would be use LIKE (as you've found), to
use text replacement (eg. F&C becomes FANDC in your table, and you do the
same substitution in your query), or to write your own wordbreaker to handle
these the way you want them to (eg. drop & when it's surrounded by spaces,
but leave it when it's touching another character). The first option is the
simplest, the last option the most complex.
Dan
|||Thanks for that detailed reply - fyi, we've chosen to treat words (so A&C
becomes AXXNDXXB) but i'm not happy that we have this workaround.
Do you know why the indexer treats the term "F&C" as individual characters
(so it really thinks you've entered F & C) and so excludes these from being
indexed?
I would be interested to know what Microsoft's official response is on this
(whether they plan on improving the word breaker), I do think it should treat
terms like F&C or E.On (all valid company names) as words and not individual
characters.
Thanks
Bimal
"Daniel Crichton" wrote:
> Bimal wrote on Fri, 2 Mar 2007 00:57:03 -0800:
>
> When FTS indexes your column, F&C will be broken up to two words - 'F', and
> 'C'. By default the noise word file contains all single letters, so these
> are ignored in the indexing. That's why you won't find them.
> There are some workarounds. Edit the noise word file for the language you
> are using, and leave just a single space in it. Then run a full population -
> this will allow FTS to index everything. You can then use the phrasing
> searches to help find the entries, but it still may not always be exact, eg.
> SELECT TOP 10 *
> FROM Content
> WHERE where CONTAINS(text_content,'"F&C"')
> would find all rows where text_content has the following:
> F&C
> F.C
> F C
> etc.
> The only way to get an exact match would be use LIKE (as you've found), to
> use text replacement (eg. F&C becomes FANDC in your table, and you do the
> same substitution in your query), or to write your own wordbreaker to handle
> these the way you want them to (eg. drop & when it's surrounded by spaces,
> but leave it when it's touching another character). The first option is the
> simplest, the last option the most complex.
> Dan
>
>
No comments:
Post a Comment