Constructing ultimate antispam solution

Two previous post in series covered how fear turns fighting spam in bad practice, common checkpoints in spam filtering and numerous methods that can be actually employed for filtering spam .

Now let’s combine that knowledge and summarize what perfect antispam solution should be.

Maximum effectiveness

Effective filter must stop as much spam as possible while letting through all valid messages . So taking valid sender and spam bot as input following table shows what is needed to be achieved or evaded for best results.

	Valid sender	Spam bot
Message sent	Good	Bad
Message not sent	Bad	Good
Message received	Good	Bad
Message filtered out	Bad	Good

Looks simple? But there are some hidden rocks in this table.

Valid sender unable to send message is extremely bad case . There are few things worse than crappy antispam filter saying to person that was going to send something your way that he is probably evil spam bot and should shut up.
Receiving spam message is NOT worst case . The point of good antispam defense is not to fight spam at any cost, it’s to maximize number of spam filtered out while not hindering valid messages. Some spam will get through, that’s normal and shouldn’t be accepted, not turned into burning fields with excessive defense.

Zero tolerance

I want to share a secret method I am using to fight spam for years. It’s great, it’s simple to use and it makes sense.

Spam doesn’t exist.

By acknowledging spam in any way, by opening those letters, by letting spam comment appear on your blog, by clicking (NO!!!) spammed link or trying to “unsubscribe”… You creative positive feedback for spammer and turn yourself into target for life.

Don’t ever interact with spam in any way.
Whatever cannot be automatically processed must pass before human eyes, don’t set default actions for such cases.

Amount of manual control

It’s impossible to evade manual control. Trying to evade manual control is recipe for disaster. Still checking everything after spam filter to ensure it’s working is almost as dumb.

Manual control should be only needed:

for initial setup;
for corrections to initial setup;
for cases that spam filter can’t confidently handle.

Rest of the time solution must be good enough so you can trust it to work automatically.

Summing up recommendations

Use white list to ensure that valid messages pass.
Avoid methods that hinder sending of messages.
Avoid crowd wisdom methods. Highly marketed, but quality is low.
Don’t use default actions in uncertain situations.

There are actually more things to avoid than to use. It’s easier to break antispam solution than to create.

Real example

At this blog I use two methods:

White list – contains everyone who has at least one approved comment.
Black list – manually maintained, with link keyword patterns.

That’s all. White list ensures that every returning commenter can make his comments without any trouble. Black list ensures that messages, that are certain spam, are filtered automatically and don’t need additional checks.

How effective it is? I have to manually process around 3% of comments. Part of this are first comments from valid contributors (ensures that there is zero chance for spam to pass), rest is spam with patterns not yet in black list.

Reasonable amount of manual control + letting valid messages through + flexible and self-controlled filter for spam messages = that is ultimate antispam defense in my opinion.

4 Comments

Lyndi 2008-12-10 #

What you have said makes sense but is it really practical? On my blog Akismet stops lots of spam comments (I have removed all the other anti-spam plugins). It is a pain going through these comments to ensure that legitimate comments are not caught up in there but surely it would be more work to manually black-list each of the spam comments as they come. Granted many of these things come from the same place so eventually the work involved to black-list them will reduce. This is something that I really need to think about.
Rarst 2008-12-10 #

@Lyndi Well, it totally practical to me. :) And using only list is just an example. I don't like Akismet much and consider it overhyped because of bundling with WordPress. I hadn't used it much as blogger but as reader I lost count of times my totally normal comments where treated as spam. Black list is essentially a set of powerful custom rules. Marking message as spam using external service like Akismet tells service that something about that message as spammy. Then service may or may not deduce signs correctly and in either case will try yo apply result (you have no idea about btw) to following comments. By maintaining black list guessing part is eliminated and create very specific and precise rules. That are very not probable to affect any legitimate comments. If (when?) spam gets trickier to manage I will probably add bayesian filtering to the mix, it's rather effective and unlike black list is self-learning.
The DataRat 2009-11-06 #

Pretty much the same as my practice. I have a Christian forum on Yahoo Groups, and posts there must have my approval the first time or two a new person sends a comment. Readers of my forum see ~zero~ spam ! Plus, you have to join before you can even attempt posting. THAT cuts down on a lot of spam, as spammers don't usually want to take the time to fill out on-line forms to join. My personal e-mail is on two accounts: Gmail, and my ISP. Google's spam filter does a TERRIFIC job of catching spam. But it put's spam in a "Junk" folder. So, I ~still~ have to review, then delete, spam. Got to check for the rare instance when good mail is misidentified as spam. Use Gmail for my 'public' e-mail address. This really minimizes the amount of spam received on my ISP account address. I reserve that for friends. But -in all cases- it's (as you say) critical never to do anything with spam except delete it. NEVER open it. Respond to it. And certainly never click on anything in it ! And, I never open attachments or click on links that I'm not sure were actually sent by a "friend". Even if from a friend, I won't click on it if I wasn't expecting it. Sometimes this means e-mailing back first, asking if they actually sent the link or attachment. Only time I ever downloaded an e-mail virus was a few years ago in an attachment sent by a dear friend. The virus on her PC sent itself out to everybody on her address list ! The DataRat .
Rarst 2009-11-06 #

@DataRat >Google’s spam filter does a TERRIFIC job of catching spam. But it put’s spam in a “Junk” folder. So, I ~still~ have to review, then delete, spam. Got to check for the rare instance when good mail is misidentified as spam. gmail auto-deletes everything in spam folder when it gets older than 30 days. But still have to look through it, it is good as catching spam but quite a few false positives as well. I used gmail to consolidate all of my email and overall quite happy with it.

Maximum effectiveness

Zero tolerance

Amount of manual control

Summing up recommendations

Real example

Related Posts

4 Comments

Lyndi 2008-12-10 #

Rarst 2008-12-10 #

The DataRat 2009-11-06 #

Rarst 2009-11-06 #