Spam filtering

From LQWiki
(Redirected from Spam Filtering)
Jump to navigation Jump to search

Most email on the Internet is unsolicited commercial email, also known as SPAM. The problem is not only annoying for users, but has been proven to cause stress. For providers the problem is expensive, since the massive amount of e-mails consume a lot of bandwidth, performance and hard-disk on the servers.

There are several ways to reduce the number of e-mails that ultimately reach your mail client. In many cases the preferred method is using the server to fight the problem in the first place. However this needs some consideration, because simply rejecting e-mails can piss-off legitimate users.

Recommended SPAM filters

There are several filters that analyze the e-mails the moment it has arrived on the server. Since the main resource being used on larger systems is diskspace, it can be interesting to reject SPAM (and thus save the diskspace and bandwidth) and use the processing power to run multiple applications. The most prominent filters are:

  • DynaStop is a free tool, (GPL licensed) that addresses the issue of spam by dynamic IP addresses.
    • These type of IP addresses are typically used for residential dial-up and dchp (dsl) users whereby a given Inernet Service Provider has a Terms of Service or Acceptable Use Policy that states end users with this type of connection is forbidden to send mail directly from their computer thus bypassing the ISP's designated mail exchange server.
    • The methodology developed in this process took seven years to perfect and an analysis of 371+ million IP addresses to date.
    • As far as the author of this software is aware, no other software available addresses this concern.
  • SpamAssassin is a free spam filtering system that filters about 95% of the spam out of a mail stream with only a small false-positive rate. The tool can tag e-mails with aditional headers rather than delete them completely, so the decides if they want to filter them into another folder in their email client. SpamAssassin may be trained, but does not need to be to start getting good results.
  • CRM114 is an email filtering system much like SpamAssassin. The author claims it has 99.984% accuracy. Can be trained with personal email or existing database.
  • Mozilla Mail/Thunderbird are e-mail clients that have integrated spam filtering. The filtering capabilities are extremely easy to use and can be trained by the user to recognize more effectively unwanted mail.
  • Evolution can be used with bogofilter to get Bayesian spam filtering that can be trained.
  • For simple good-old-unixy effectiveness, it's hard to beat the old veteran, ifile. Easy to train (pipe an mbox file), easy sort (5 lines in your .procmailrc). And it's not just spam vs non-spam, it can be spam vs work vs personal vs whatever.

Recommended Blacklists (RBL's)

SPAM e-mails are typically sent, while trying to hide the real sender. People abuse other systems to get their message into the world. Several parties have started efforts to blacklists IP-addresses that are mainly used to sent SPAM. By adding some of these blacklists to your MTA, or Mail Transfer Agent, like Postfix, Qmail or Sendmail you reduce the number of e-mails (mainly SPAM) that arrive at your webserver. The major blacklists can be found here:

  • ipwhois.rfc-ignorant.org checks if the IP-address is properly in DNS
  • bl.spamcop.net
  • relays.ordb.org
  • opm.blitzed.org
  • list.dsbl.org
  • www.spamhaus.org
  • cbl.abuseat.org
  • dul.dnsbl.sorbs.net
  • blackhole.securitysage.com
  • virus.rbl.spamtrap.nl IP addresses known to sent viruses along.
  • cbl.abuseat.org

Spamassassin will automatcially use several blacklist sites if appropriate packages are installed (e.g. razor, pyzor).

Spam Fighting Techniques