Previous post on spam covered checkpoints where antispam filters can be applied . However nature of filters matters a lot for setting up effective antispam solution.
Sadly manual moderation is most effective. No filters are sophisticated enough to deal with every spam attack possible and human decision is best authority here. It may be certainly unproductive to deal with everything manually but it’s not right to escape manual work. The more human input and tuning filters get the more effective can they become.
Most primitive form of filtering are custom rules. Something as simple as “if address is not mine then it’s spam” can sometimes do wonders. Creating filters manually may not be productive but it’s often pretty effective.
White list assumes that every message that doesn’t come from previously approved sender is supposedly spam. It’s obviously good for keeping spam out. Together with every message from new sender. While having extreme rate of false positives on its own white listing is often applied first to get important messages through without risk of losing them to other filters. Some systems allow to automatically add to white list sources that passed moderation once.
Black list checks message against list of text strings and/or senders assuming it is spam if there is match found. Black list is extremely effective against huge volumes of spam with roughly same content. Overall black list is as good as you can make (or get) it. There is also danger of false positives if list reacts to words and phrases that can appear in legitimate messages.
Karma is like light mix of white and black lists. It takes in past moderation events and calculates modifiers for specific senders. It’s considered effective in long term but is not helpful against new messages or human spam that may start with few valid messages first.
This one is based on pure math and is extremely effective. Bayesian filters keep database of all words they had ever encountered and how often they occurred in spam and non-spam messages. Upon receiving new message filter looks up all words in it and calculates probability of it being spam. Downside is that it relies on manual correction and slightly susceptible to poisoning - when big chunk of valid text is used to get small chunk of spam along. It doesn’t react to randomly generated text well either.
Proof of work
This method makes sender perform additional tasks. They may be tasks that can only be performed by humans (captchas, questions) or calculating. Latter is kinda upgraded behavior analysis with additional effect of slowing down spam bot.
Google had popularized nofollow attribute for links claiming it would reduce online spam by removing value of spammed links for search engine optimization. Well it had no effect on spam at all and nofollow was turned into weapon to fight advertisement paid links. Removing value is extremely ineffective because carpet bombing is main concept behind spam. It doesn’t really care to check if messages are bringing value on case by case basis.
Poisoning tries to render spam bot ineffective, usually by feeding it huge amount of falsified data. It’s not widely used and effectiveness is questionable.
Method tries to detect spam by getting extra action that won’t be performed by human in same conditions. Extra line in form that says “don’t fill me” is usual example.
Dynamic method of sending
Periodic change of method to send message prevents bots from remembering it. Disposable email addresses and changing contact forms fall under this. Can be effective (if automated) for reducing amount of spam but can’t eliminate it completely. It can also lead to losing messages if expired method is used to send valid one.
This methods usually relies on collaboration from multiply participants in creating huge spam database. Messages are simply checked against it by hash or otherwise. Effect depends on database quality and downside is that such database can be possibly poisoned for treating valid messages as spam.
Next post in series is going to cover some factors to consider in choosing filter and examples from my personal experience.