Tuesday, 11 May 2010

T-SQL Tuesday #006: fixing the text blob

Thought I'd join the T-SQL Tuesday party. Originally started by  Adam Machanic (Twitter: @AdamMachanic), this one is hosted by Michael Coles.

Many moons ago whilst working for MyDBA we came across a data corruption problem at a client site. Their vendor supplied application running on SQL Server 2000 stored a bunch of notes in a text column with lovely HTML formatting. The same data was also stored in a separate column in the same row sans HTML markup for use in full-text searches.

For reasons lost to antiquity, they started getting a lot of data corruption errors. Any query that tried to retrieve data from the table with a corrupt record failed with a batch terminating severity 20 error, or it may have been severity 21, and a nasty long complicated message was presented to the end users.

Running DBCC CHECKTABLE detected the errors and advised that the minimum repair level was REPAIR_ALLOW_DATA_LOSS. Running this on a test instance of the database resulted in the offending rows being removed from the table and DBCC CHECKTABLE ran clean. Needless to say, our client wasn't impressed with the legal situation that placed them in, data loss was not acceptable.

Luckily, we were able to determine that it was the text data field used by the full-text indexing that was corrupted. So with some help from the Lobster Pot man, Rob Farley (Twitter: @rob_farley), we came built a .Net application that traversed through the affected table till it found a corrupt record, stored all the remaining fields, stripped the HTML from the original column, deleted the corrupt record and inserted a new corrected record.

Result, much happier client.

1 comment:

Rob Farley said...

Ah, I remember that. And my blog is at http://sqlblog.com/blogs/rob_farley

:) Good to see you joining the T-SQL Tuesday crowd. :)