Main Page | Compound List | File List | Compound Members | File Members

Mail Filter Documentation

2.0

A Rule Based SPAM Filter

The Web page for this email filter can be found at An Email Filter for a UNIX Shell Account on www.bearcave.com.

This e-mail filter is designed for email on UNIX/Linux shell accounts.

Installing the SPAM Filter

To install do the following:

Unzip the software. The software is packaged with GNU tar and is compressed using gzip. To unpackage:
```
   tar xzvf mail_filter.tar.gz
```
The software will be unpackaged into the directory mail_filter
Build the software. The Makefile is targeted at UNIX make, or, on Linux pmake. Simply enter the command pmake and the mail_filter executable should be built.
I developed the software on Windoz, so a Windoz Makefile (for nmake) is included as well (see Makefile_win). To build on Windoz enter:
```
    nmake -f Makefile_win
```
The mail filter parameter file SpamFilterParams must be in your home directory. So copy the parameter file mail_filter/SpamFilterParams to your home directory.
Make a symbolic link from your email file to a file in your local directory named inbox. If your email file is /var/mail/iank then make the following symbolic link
```
    ln -s /var/mail/iank inbox
```
Set up mail forwarding. Unfortunately this differs on Linux and on UNIX (e.g., freeBSD).
- Linux
  Mail forwarding is done via procmail. On my Linux system I set up the following .procmailrc file:
```
   :0fw
   * < 75000 
   | /home/iank/bin/mail_filter
  
```
  where the mail_filter program is installed in /home/iank/bin
- On freeBSD UNIX I used a .forward file that contained:
```
  "|~iank/bin/mail_filter"
  
```
That should do it. Sorry, I can't provide support if this does not work.

Design Objectives

The SPAM filter has the following design objectives:

Speed/low system overhead.
Some spam filters are implemented in interpreted languages like Python or PERL. This spam filter is implemented in C++. This makes this spam filter faster and allows it to consume fewer system resources while processing email.
Light weight
As the volume of spam has increased spam filters like SpamAssassin and DSPAM have become very sophisticated. These SPAM filters are powerful, but I've found their configuration obscure. I wanted a filter that was fast and easy to use (of course since I wrote it, it is easy for me to use).
Reduce the size of the suspected spam file.
Especially when there are virus outbreaks, I get hundreds of spams a day. If the spam is routed to a suspected spam file, reviewing the subject lines for all these spams is tedious. So this filter attempts to reduce the size of this file by deleting email the is virtually without question spam (I don't have a lot of people sending me email about penises or drugs like Viagra or Xanax). The mail filter can also be configured to remove any email that contains a base64 encoded section. This deletes most virus email.
Configurable without recompiling. Earlier versions of the spam filter compiled in the parameters like the spam word list. This version uses a parameter file. This allows the filter to be customized without recompiling, at the cost of slower start-up (although I have not noticed the difference in practice).
Rule based. The theory behind the design I've used here is that by implementing a set of rules to identify spam, most of the spam can be recognized and much of it can be discared.

Rules for Recognizing Email and Spam

The email is treated as a valid email and copied into the inbox file (which, as noted above is a symbolic link to your mail file) if:

If an entry in the SpamFilterParams "to_list" is found in the "To:" part of the header. This is where mailing list addresses go.
If an entry in the "from_list" is found in the "From:" part of the header. This is the place to put mail from people you know.

Recognizing spam:

Check the "To:" and "Cc:" lists to make sure that it is actually addressed to you. Many spammers do not include your e-mail address in the "To:" line since they are using mailing lists, CC lines or direct SMTP connections. I get a lot of spam addressed to "postmaster". In the "old days" this was always supposed to be a valid email address for an internet domain. Given the rise of spam, this is no longer practical.
Of course even correctly addressed email may still be spam. But if the email does not have your e-mail address in either the "To:" or the "Cc:" then it it will be put in the spam folder.
Check the subject line for spam and kill words (e.g., breast, xanax). If a word or phrase is found, move the email file to the appropriate file (e.g., junk_mail).
Pick up the boundary line in the email header. This is used in processing the email body.

The end of the email header is recognized by the first occurance of a blank line. The processing done for the email body includes:

Look for lines that begin with two dashes "--". These are usually boundary lines. Following the boundary line is the MIME content description.
Email that starts with an HTML section is marked as "suspect" and moved into the junk_mail file.
A text section that contains the <HTML> or <BODY> tags is marked as suspect and moved to junk_mail.
Email that includes a base64 encoded section is never put in the inbox. It is either moved to junk_mail or discarded if the kill_base64 flag is set in the SpamFilterParams file.
Look for spam or kill words in a MIME text section.
Filter out the HTML tags and look for spam or kill words in the MIME HTML section. By filtering out the HTML tags the spammer tactic of hiding content in HTML is defeated.

Other Features

The garbage_trace file
If the trace_garbage flag is included in the SpamFilterParams file and the keep_garbage flag is not included, a garbage mail trace file will be generated. This file contains a summary of the email headers. It can be looked at with email tools like UNIX mail and elm. The garbage_trace file tells the user which emails were discarded. If you expected an email and never got it, you can check this file. When the file reaches 50K bytes it is truncated and starts again.
Multiple emails in a single invokation
At in the case of my ISP's email system, multiple emails may be received in one invokation. The mail filter will separate them out and process each one. This also avoid having a spammer or virus writer piggyback a virus in an email that appears valid.

Generating the Documentation

The source code documentation for this software is formatted for doxygen, which is available from www.doxygen.org.

Assuming that doxygen is installed on your system, you can regenerate this documentation with the command


    make -f Makefile_win doxygen (on Windoz)
    pmake -f Makefile doxygen    (on Linux)

Copyright and Use

This email filter was written by Ian Kaplan, Bear Products International. It is copyrighted by Ian Kaplan, 2004, www.bearcave.com.

You may use this software for any purpose, with the two conditins listed below.

You must preserve this copyright notice in this software and any software derived from it.
You accept any risk entailed in using this software. By using this software, you acknowledge that you have a sophisticated background in software engineering and understand the way this software functions. You further acknowledge that using this software may result in the irretrievable loss of important e-email and you alone are responsible for this loss.

If either of these conditions are unacceptable, you may not use any part of this software.

Please send any bug fixes or suggested source changes to: iank@bearcave.com

Generated on Sat Mar 27 13:07:37 2004 for Mail Filter by

1.3.3