THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Spam Fighting and Lessons Learned

Here on SQLblog, we take spam seriously. Actually, I should rephrase that: I take spam really, really, really seriously, some other people take spam somewhat seriously, and some don't really seem to care. It all balances out in the end. We've done a pretty good job keeping the site clean and mostly spam-free. We have a few different automated anti-spam tools, but it's not enough. I do manual sweeps though most of the site once or twice a week and pick up anything that managed to slip through the cracks. I'm constantly updating and refining our rules to try to keep the manual work as quick and painless as possible. I take it as a point of pride that we have one of the lowest spam rates I've ever seen on a large blog site with anonymous comments enabled.

And all of that work, and my pride, is still not enough. A couple of days ago I discovered, amidst a mild rush of panic, that spammers were getting their links on the site by creating new user accounts and putting the links in their profiles. Many of these accounts had obvious names that immediately signaled that something was up. A user account called "69 Sexy" on a site like SQLblog is a dead giveaway. Some of them were obvious but in stranger ways; we had over 80 accounts created by spammers advertising various fake Christmas tree sites. Christmas tree spammers? Seriously?

Being the kind of guy I am, and given the amount of work I put in fighting spam on the site, there was no way I was going to let these accounts live. So I hit the database and started hacking around trying to figure out how to stop them. "Hacking around" is not generally a good thing when you're trying to do database work, but the design that underlies Community Server is, shall we say, not the greatest I've ever seen. After several minutes of prodding and false starts, I found the table where the profile information is stored and thought about how to do the cleanup.

I looked at a few of the spam accounts and found that they shared some common qualities: They all had "interesting" keywords in their profiles, but none of them had associated posts in blog comments or the forums--these, if they ever did exist, would have been taken care of by one of our other anti-spam measures.

I did a few queries and discovered that there are five or six core tables in which most user data is stored, and about 30 satellite tables. My fear was that I might catch a real, valid account using one of the spam keywords (some valid word could have an embedded fragment of a spam keyword), and I decided to let referential integrity protect me. I wrote a script that started a transaction, did the deletes from the core tables in a TRY block, and if any exception occurred a rollback would fire in the CATCH block. I tested this logic using my own account (safely inside of a nested transaction to make absolutely certain I didn't delete myself!), and was pleased to see that the transaction was rolled back as expected.

And from that point on, it was kind of fun. I thought of all sorts of keywords and ran my script against them, removing scores of user accounts from the site. Every few keywords I would check the users list and find a few more. Goodbye, "forex traders". Goodbye, "payday loans". And see you later, "wire frame Christmas trees". (What the hell is a wire frame Christmas tree? I may never know.)

Of course, I wasn't using these keywords in their full, natural form. If I had a user with "forex trader" in his profile, I might delete all users where I found the string "forex", and maybe all users where the string "trade". The work was done using code along the lines of: Profile LIKE '%' + @string + '%'. I figured that it didn't matter if I caught some non-spam users, thanks to my diligent work with TRY-CATCH and the database's referential integrity constraints.

The next day, I received an e-mail from one of the site's bloggers. He was locked out of his account, unable to access his blog. And so I jumped back into the database and made a startling discovery: The database, as it turns out, has a few referential integrity constraints. And it also lacks a few referential integrity constraints in places where a normal person might expect to see them. Oops.

Turns out that I deleted the user accounts for about a quarter of our bloggers, and about 50 users who had left perfectly valid comments in blog posts. Luckily, we're almost as serious about backups as we are about spam, and a quick database restore and an hour and a half of scripting later all of the deleted user data was restored.

The morals of this story, for me: First and foremost, assumptions aren't worth much. This is something we all know, yet it's a trap I fall into over and over and over. Some lessons are more difficult to learn than others. Second, and more technical, don't trust a database design until you fully understand its ins and outs. Otherwise, surprises can and will pop up. This is another lesson I've learned countless times and seem to keep hitting. Something in me really wants to believe that most database designers do the right thing. But the reality is, they don't.

And the final lesson of the day is not to get too caught up in pride or determination. I wanted those spammers out, and I wanted them out that moment. I wasn't willing to wait a bit longer or think a bit more carefully through my solution. And I paid the price for that rash decision making. Those spam links weren't actually hurting anything, they were merely causing me irritation. In retrospect, I could have handled the situation in a much more thoughtful manner.

In the end, no real harm was done. A valuable lesson or two was learned. And I got a topic for a blog post out of the deal. Not bad. And one day those spammers will feel my wrath... In a cool, level-headed way.

Published Wednesday, December 16, 2009 12:01 PM by Adam Machanic
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

mjswart said:

Very a propos. About an hour ago I wrote a post on the topic of learning from failures. http://michaeljswart.com/?p=340

December 16, 2009 12:05 PM
 

AaronBertrand said:

Adam, do you know how to stop the topsy crap?  I get a comment from topsy on every post, "this post was mentioned on twitter..."

December 16, 2009 12:10 PM
 

Adam Machanic said:

Michael, glad you enjoyed the post. Yours is one of the accounts I killed :-)

Aaron, I've allowed that topsy stuff (in addition to ubervu) to come through since just before it started I'd had requests for "tweetback" support on the site and that's what it seems to provide. Do you consider it to be spam? I can certainly reevaluate...

December 16, 2009 12:40 PM
 

AaronBertrand said:

I can see how some would see value in it, but I don't really.

December 16, 2009 1:29 PM
 

Michael K. Campbell said:

You guys need to give ReverseDOS a try.

It's a simple HTTP Module for ASP.NET sites and uses filters to let you flag/trap spammy crap (patterns via plain text or regex). When it finds a matching result (or spammy comment) it lies to the bot pushing the spam up and tells it that it's 'processing' the request... for about a minute or so (you can configure the time as needed) then it throws a 503 - Server Busy. Other users on your site won't see anything out of the ordinary. But you'll end up:

a) stopping the spam from going into your DB

b) slowing the BOT down.

I've found that most bots (or their owners) notice that you're slowing their bots down significantly, and start to leave you alone after a while.

RDOS is designed to TRUST content from authenticated or logged-in users, but it shouldn't be too hard to turn that functionality off. (Let me know if you need a hand with that.)

http://www.angrypets.com/tools/rdos/

December 16, 2009 2:31 PM
 

andyleonard said:

  Adam: That's pretty harsh. I mean, I try to sell a few wireframe Christmas trees and you lock me out!

  Aaron: I moderate comments. That caught a lot of the stuff I used to get.

:{> Andy

December 16, 2009 3:26 PM
 

Adam Machanic said:

Michael: We're already running it :-) ... it's certainly saved us from a lot of stuff, but not everything.

December 16, 2009 3:50 PM
 

AaronBertrand said:

Andy,

YMMV, but I don't like the idea of moderating comments.  It means that I have to "okay" the conversations of others and that they can't carry on in my absence.  Since there are many occasions in a given month where I don't visit the site for a week or more, I certainly would rather let a few crap comments get in than stifle discussions altogether.

December 16, 2009 6:51 PM
 

Jimmy May, Aspiring Geek said:

Adam, I'm sympathetic, applaud your efforts, & am proud of your mea culpas.

December 16, 2009 11:03 PM
 

Brian Tkatch said:

Adam, good story, and nice that you posted in. It's good to know you are a real person. :)

December 17, 2009 9:52 AM
 

KKline said:

Great post, Adam.  I especially appreciate that it probably took you as long to write the informative article as it did to write your initial scripts or run your recovery.  It takes time to instruct others, so you've done the community a services.

I don't moderate comments, incidentally, but I've got alarms set whenever they come in and, when appropriate, delete them.

December 23, 2009 5:40 PM
 

Web Design Company said:

Excellent post Adam.I really appreciate your effort.

December 30, 2009 2:32 AM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement