THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Richard Hundhausen: The DBAgilist

This is a mirror of Richard Hundhausen's (aka The DBAgilist) blog "Tales from the Doghouse."

Browse by Tags

All Tags » Blogging   (RSS)

  • Removing pingback, trackback, and comment spam from dasBlog

    Ok, I finally got fed up with all of the spam in my historical dasBlog postings. It's really embarrassing to send a link to a a colleague, only to have them snicker at all of the spam comments and trackbacks.

    For those of you who don't know what a trackback is, it's basically an acknowledgement that enables authors to keep track of who is linking to, or referring to their articles. When used properly, trackbacks form a communication link between the two blogs, so that new comments on one blog can basically ping the other, allowing readers to easily follow discussions on both. The problem is that spammers have abused this mechanism and bloggers end up with trackbacks and pingbacks to various gambling, herbal medication, and adult sites.

    Earlier this year I joined the ranks, and disabled my trackback and pingback services in dasBlog. I then followed Scott Hanselman's advice on using Akismet spam blocking service.

    The big effort was then how to cleanup the <Comment> and <Trackback> elements that were spam, so, like others before me, I built a tool to assist with this.

    1. Download ScrubDasBlog.zip or ScrubDasBlogSource.zip to your hard drive
    2. Edit the blacklist.txt to include your own blacklisted URLs *
    3. Backup your existing feedback files: \content\*.dayentry.xml
    4. Run the ScrubDasBlog utility, specifying the path to your \content folder and the path to your blacklist.txt file, for example:

    scrubdasblog c:\inetpub\wwwroot\mydasblog\content c:\scrubdasblog\blacklist.txt

    * If you have predominately more SPAM comments and trackbacks in your dasBlog history, then you can generate a starter blacklist by going into your \content sub-folder and typing the following:

    type *.xml | find "AuthorHomepage" > blacklist.txt

    After you generate the blacklist.txt file, you should remove any good sites and remove any duplicates, before running the ScrubDasBlog utility.

    I would recommend downloading the Source code version and reading through my code. Please comment on any improvements you might make.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement