THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Hilary Cotter

  • SQL Full-text search anomolies

    On the newsgropes recently I have encountered some posters who are perplexed by the results of their full-text searches.

    The first poster was puzzled as to why searches on f.b.i. would not find F.B.I. in his content.  SQL FTS and Microsoft Search products index uppercased tokens with periods in them differently than other lower or upper cased tokens. During indexing most language word breakers will index upper cased tokens with periods in them as the token, and then the token with all the periods stripped out. So F.B.I. is indexed as F.B.I. and FBI. All tokens are stored upper cased in the index with the exception of path names (for cosmetic reasons - e.g. in Indexing services a path like c:\TeMp is stored as c:\temp as lowercase was considered to look better than uppercase).

    There is no length cut off as far as I am aware. For example, C.R.E.E.P. (Committee to Re-Elect the President - the unfortunately and what we can only hope was unintentional acronym for President Richard Nixon's second election campaign) is indexed as CREEP and C.R.E.E.P. I've tried up to 10 letter combinations.

    Here is an example:

    Create table testAcronymTable(pk int not null identity constraint testAcronymTablePK primary key, charcol char(20))

    GO

    insert into  testAcronymTable(charcol) values('F.B.I.')

    insert into  testAcronymTable(charcol) values('f.b.i.')

    insert into  testAcronymTable(charcol) values('fbi')

    insert into  testAcronymTable(charcol) values('FBI') 

    insert into  testAcronymTable(charcol) values('F B I') 

    insert into  testAcronymTable(charcol) values('f b i') 

    GO

    create fulltext catalog testacronymcatalog

    GO
    --indexing using the neutral word breaker
    create fulltext index on testAcronymTable(charcol language 0) key index testAcronymTablePK on testAcronymCatalog
    GO
    --returns f.b.i., F.B.I., F B I and f b i  - expected, SQL FTS
    --throws away the .'s
    --you get the same results if f, b, and i are removed from the noise list
    drop fulltext index on testAcronymTable
    GO
    --indexing using the American English Word breaker word breaker
    create fulltext index on testAcronymTable(charcol language 1033) key index testAcronymTablePK on testAcronymCatalog
    GO
    select * from  testAcronymTable where contains(*,'fbi')
    GO
    --returns fbi F.B.I, and FBI - fbi, and FBI are expected, unexpected is
    --F.B.I, F.B.I is indexed as F.B.I. and FBI.
    select * from  testAcronymTable where contains(*,'FBI')
    GO
    --returns fbi F.B.I, and FBI - fbi, and FBI are expected, unexpected is
    --F.B.I, F.B.I is indexed as F.B.I. and FBI.
    select * from  testAcronymTable where contains(*,'F.B.I.')
    GO
    --returns fbi F.B.I, and FBI - F.B.I is expected, unexpected is
    --FBI and fbi, F.B.I is indexed as F.B.I. and FBI.
    select * from  testAcronymTable where contains(*,'f.b.i.')
    GO
    --returns f.b.i., F B I, and f b i -if your noise word list does not contain
    --f, b, and i
    --returns nothing -if your noise word list does contain
    --f, b, and i

    Note that this works for most languages but does NOT work for the neutral, Chinese (both types),  Japanese, and Thai word breakers. They will index F.B.I. as F, B, and I - in other words as three separate letters. Korean will index F.B.I. as F.B.I

    The next post covers some of the unusual behaviors while searching on C#, C+, C++, and F#, etc

     


  • handy replication proc

    A friend of mine called me up last night. He had a problem replicating 300,000,000 transactions per day, or was it per hour. Can't recall, it was a late night.

    Anyhowze, with volumes like this replication might not be the best tool. His problem was that the distribution agent to two of his subscribers was hanging on initializing. From experience I know this is caused by depleted buffers on the publisher and the only solution I know of is a reboot. Naturally this was not an option for him.

    His has his max distribution retention setting to 72 hours and was under the impression that this would decrease the amount of time required to purse distributed transactions. I told him that min distribution retention setting controlled this, and that his problem with expired subscribers was likely caused by this low max distribution retention setting or the history setting. Subscribers are expired based on the lesser of these two settings.

     Anyhowze on with my story - so he asks me what he can do to improve performance and I recommended he use sp_scriptdynamicupdproc to create a more efficient update proc (as his app did heavy updating), or its SQL 2005 analogue sp_scriptsupdproc.

    I expect another midnight call from him soon as he continues in his losing battle.   


  • transaction marks

    There seems to be some confusion out there in the internet about how to implement transaction marks. Or perhaps I was merely very confused about how to implement them.

     

    Transaction marks allow you to restore two or more database to a consistent point in time. For example if you have two database and your app writes to both databases, how do you restore them to a consistent point in time without kicking all your users off the databases? This is complicated by many things, for example backup speed.

    I googled looking for a working example but could not find one, so here is a script I wrote to demonstrate it.

     

    create database MarkedTransaction
    GO
    create database MarkedTransaction1
    GO
    use MarkedTransaction
    go
    create table test1(pk int)
    GO
    use MarkedTransaction1
    GO
    create table test1(pk int)
    GO
    backup database MarkedTransaction to disk='c:\MarkedTransaction.bak' with init
    GO
    backup database MarkedTransaction1 to disk='c:\MarkedTransaction1.bak'with init
    GO
    begin transaction
    insert into MarkedTransaction.dbo.test1(pk) values(1)
    insert into MarkedTransaction1.dbo.test1(pk) values(1)
    commit tran
    GO
    begin transaction mark6 with mark 'marked transaction'
    insert into MarkedTransaction.dbo.test1(pk) values(2)
    insert into MarkedTransaction1.dbo.test1(pk) values(2)
    commit transaction mark6
    GO
    begin transaction mark7 with mark 'marked transaction - 1'
    insert into MarkedTransaction.dbo.test1(pk) values(3)
    insert into MarkedTransaction1.dbo.test1(pk) values(3)
    commit tran mark7
    GO
    backup log MarkedTransaction to disk='c:\MarkedTransactionlog.bak' with init
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1.bak' with init
    GO

    use master
    GO
    --backing up the tail
    backup log MarkedTransaction to disk='c:\MarkedTransactionlogTail.bak' with init, norecovery
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1Tail.bak' with init, norecovery
    GO
    --restoring the database
    restore database markedtransaction from disk='c:\markedtransaction.bak' with norecovery
    restore database markedtransaction1 from disk='c:\markedtransaction1.bak' with norecovery
    GO
    --restoring the log
    RESTORE LOG MarkedTransaction FROM 
    DISK = 'c:\MarkedTransactionlog.bak'
    WITH  STOPATMARK = 'mark6', recovery
    GO

    RESTORE LOG MarkedTransaction1 FROM 
    DISK = 'c:\MarkedTransactionlog1.bak'
    WITH  STOPATMARK = 'mark6', recovery
    GO

    select * from MarkedTransaction.dbo.test1
    select * from MarkedTransaction1.dbo.test1
    --both 1, 2 are there, 3 is not
    --repeating,  but this time we will stop before
    backup database MarkedTransaction to disk='c:\MarkedTransaction.bak' with init
    GO
    backup database MarkedTransaction1 to disk='c:\MarkedTransaction1.bak'with init
    GO
    begin transaction
    insert into MarkedTransaction.dbo.test1(pk) values(6)
    insert into MarkedTransaction1.dbo.test1(pk) values(6)
    commit transaction
    GO
    begin transaction mark8 with mark 'marked transaction -2'
    insert into MarkedTransaction.dbo.test1(pk) values(7)
    insert into MarkedTransaction1.dbo.test1(pk) values(7)
    commit transaction mark8
    GO
    begin transaction
    insert into MarkedTransaction.dbo.test1(pk) values(8)
    insert into MarkedTransaction1.dbo.test1(pk) values(8)
    commit transaction
    GO
    backup log MarkedTransaction to disk='c:\MarkedTransactionlog.bak' with init
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1.bak' with init
    GO

    use master
    GO
    backup log MarkedTransaction to disk='c:\MarkedTransactionlogTail.bak' with init, norecovery
    backup log MarkedTransaction1 to disk='c:\MarkedTransactionlog1Tail.bak' with init, norecovery
    GO
    RESTORE Database MarkedTransaction  from disk='c:\markedtransaction.bak' with norecovery
    RESTORE Database MarkedTransaction1  from disk='c:\markedtransaction1.bak' with norecovery
    GO
    RESTORE LOG MarkedTransaction FROM 
    DISK = N'c:\MarkedTransactionlog.bak'
    WITH  STOPBEFOREMARK = N'mark8' , recovery
    GO
    RESTORE LOG MarkedTransaction1 FROM 
    DISK = N'c:\MarkedTransactionlog1.bak'
    WITH  STOPBEFOREMARK = N'mark8' , recovery
    GO
    select * from MarkedTransaction.dbo.test1
    select * from MarkedTransaction1.dbo.test1
    --both 1, 2, and 6 are there, 7 is not







     

     

     


This Blog

Syndication

Tags

No tags have been created or used yet.

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement