#include <MailFilter.h>
Public Types | |
enum | classification { BAD_VALUE, UNKNOWN, EMAIL, SUSPECT, GARBAGE } |
Public Member Functions | |
MailFilter (SpamParameters ¶m) | |
~MailFilter () | |
Private Member Functions | |
bool | isFromLine (const char *buf) |
bool | copyToTempFiles () |
const char * | getNewTempFileName () |
FILE * | openFile (const char *fileName, const char *mode, const char *callingFunc) |
void | closeFile (FILE *fp, const char *fileName, const char *callingFunc) |
bool | writeLine (const char *buf, FILE *fp, const char *fileName, const char *callingFunc) |
const char * | readLine (char *buf, const size_t bufSize, FILE *fp, const char *fileName, const char *callingFunc) |
void | append_file (const char *srcfile, const char *destfile) |
void | error_append_file (const char *srcfile, const char *destfile) |
classification | checkMail (const char *tempFileName, SpamParameters ¶ms, HeaderInfo &headInfo) |
Private Attributes | |
size_t | mFileCount |
Logger | log |
std::vector< char * > | fileNames |
Definition at line 60 of file MailFilter.h.
|
Filter email read from stdin. Email is read from stdin. If there is more than one email, each email will be placed in a unique temporary file. Each file is then processed in an attempt to determine if it is valid email or spam. Email is catagorized as EMAIL (or UNKNOWN), SUSPECT, or GARBAGE. Email is marked as garbage when a "kill_word" is found. If the "kill_base64" flag is included in the SpamFilterParams file, email that contains base64 encoded data will be marked as garbage. Email that is marked as garbage is only placed in the garbage file when the SpamFilterParams flag "keep_garbage" is included. Otherwise email marked as garbage is discarded. This keeps the number of headers that must be reviewed in the junk_mail file as low as possible.
Definition at line 513 of file MailFilter.C. References append_file(), checkMail(), copyToTempFiles(), error_append_file(), Logger::errorFound(), fileNames, Logger::getLogger(), SpamParameters::hasFlag(), Logger::log(), log, mFileCount, and HeaderInfo::subject().
00514 { 00515 // file for email 00516 const char* INBOX = "inbox"; 00517 // fiile for email that is suspected of being spam 00518 const char* SPAM = "junk_mail"; 00519 // File for email that is "garbage". 00520 const char* GARBAGE_MAIL = "garbage_mail"; 00521 00522 mFileCount = 0; 00523 log = pLogger->getLogger("MailFilter"); 00524 log.log(Logger::DEBUG, "MailFilter", "enter"); 00525 00526 bool doGarbageTrace = params.hasFlag("trace_garbage") && 00527 (! params.hasFlag("keep_garbage")); 00528 00529 // read mail file from stdin into one or more temporary file 00530 if (copyToTempFiles()) { 00531 size_t numFiles = fileNames.size(); 00532 for (int i = 0; i < numFiles; i++) { 00533 const char *tempFileName = fileNames[i]; 00534 00535 HeaderInfo headInfo( doGarbageTrace ); 00536 00537 char msg[256]; 00538 classification kind = checkMail(tempFileName, 00539 params, 00540 headInfo); 00541 00542 Logger::LogLevel mode; 00543 00544 switch (kind) { 00545 case UNKNOWN: 00546 { 00547 // If the email is classified as "UNKNOWN" then something is 00548 // wrong. But we don't want to lose the email, so append it 00549 // to the inbox. 00550 sprintf(msg, "email classified as UNKNOWN"); 00551 append_file( tempFileName, INBOX ); 00552 mode = Logger::ERROR; 00553 } 00554 break; 00555 case EMAIL: 00556 { 00557 sprintf(msg, "Subject: %s added to mail in %s", 00558 headInfo.subject(), INBOX ); 00559 append_file( tempFileName, INBOX ); 00560 mode = Logger::DEBUG; 00561 } 00562 break; 00563 case SUSPECT: { 00564 sprintf(msg, "Subject: %s added to suspected spam in %s", 00565 headInfo.subject(), SPAM ); 00566 append_file( tempFileName, SPAM ); 00567 mode = Logger::DEBUG; 00568 } 00569 break; 00570 case GARBAGE: { 00571 if (params.hasFlag("keep_garbage")) { 00572 sprintf(msg, "Subject: %s is garbage, copied to %s", 00573 headInfo.subject(), GARBAGE_MAIL ); 00574 append_file( tempFileName, GARBAGE_MAIL ); 00575 } 00576 else { 00577 sprintf(msg, "Subject: %s deleted", headInfo.subject() ); 00578 } 00579 mode = Logger::DEBUG; 00580 } 00581 break; 00582 case BAD_VALUE: { // something went wrong processing the e-mail 00583 sprintf(msg, "Mail filter error: Subject = %s", headInfo.subject() ); 00584 // Append it to the inbox so it is not lost. The error_append_file 00585 // function will add a marker to the file to indicate that there 00586 // was an error 00587 error_append_file( tempFileName, INBOX ); 00588 mode = Logger::ERROR; 00589 } 00590 break; 00591 default: { 00592 sprintf(msg, "bad classification value" ); 00593 mode = Logger::ERROR; 00594 } 00595 break; 00596 } // switch 00597 00598 log.log( mode, "MailFilter", msg ); 00599 00600 if (! log.errorFound()) { 00601 // remove temporary file 00602 sprintf(msg, "removing %s", tempFileName ); 00603 log.log(Logger::DEBUG, "MailFilter", msg ); 00604 int unlinkRslt = unlink( tempFileName ); 00605 if (unlinkRslt != 0) { 00606 sprintf(msg, "error unlinking %s. Error = %s\n", 00607 tempFileName, strerror(errno)); 00608 log.log(Logger::ERROR, "MailFilter", msg ); 00609 } 00610 } 00611 else { 00612 sprintf(msg, "email that caused the error is in %s", tempFileName ); 00613 log.log(Logger::ERROR, "MailFilter", msg ); 00614 } 00615 } // for 00616 } // if copyToTempFiles 00617 00618 log.log(Logger::DEBUG, "MailFilter", "exit"); 00619 } // MailFilter constructor |
|
Recover the memory allocated for the temporary file name in fileNames. Definition at line 166 of file MailFilter.C. References fileNames.
|
|
append_file Append srcfile to destfile. This is used when the destination of the email is decided. The email will either be appended to the junk file or back to the email box. A carriage return is added between the e-mails. This avoids having e-mails run together. Definition at line 397 of file MailFilter.C. References Logger::log(), and log. Referenced by MailFilter().
00399 { 00400 char msgbuf[ 128]; 00401 const char *read_only = "r"; 00402 const char *append = "a+"; 00403 FILE *read_fp; 00404 FILE *write_fp; 00405 00406 log.log(Logger::DEBUG, "append_file", "enter"); 00407 00408 if ((read_fp = fopen( srcfile, read_only )) != NULL) { 00409 if ((write_fp = fopen( destfile, append )) != NULL) { 00410 char buf[ 4096 ]; 00411 size_t amt_read; 00412 size_t amt_written; 00413 00414 fprintf(write_fp, "\n"); // add a carriage return (blank line) 00415 00416 while ((amt_read = fread(buf, 1, sizeof(buf), read_fp)) > 0) { 00417 amt_written = fwrite(buf, 1, amt_read, write_fp ); 00418 if (amt_written < amt_read) { 00419 char *err_reason = strerror( errno ); 00420 sprintf(msgbuf, "error writing file %s. Reason = %s", destfile, err_reason); 00421 log.log(Logger::ERROR, "append_file", msgbuf ); 00422 } 00423 } // while 00424 00425 fclose( write_fp ); 00426 } 00427 else { 00428 char *err_reason = strerror( errno ); 00429 sprintf(msgbuf, "append_file: error opening file %s. Reason = %s", 00430 destfile, err_reason ); 00431 log.log(Logger::ERROR, "append_file", msgbuf ); 00432 } 00433 fclose( read_fp ); 00434 } 00435 else { 00436 char *err_reason = strerror( errno ); 00437 sprintf( msgbuf, "append_file: error opening file %s. Reason = %s", 00438 srcfile, err_reason ); 00439 log.log(Logger::ERROR, "append_file", msgbuf ); 00440 } 00441 log.log(Logger::DEBUG, "append_file", "exit"); 00442 } // append_file |
|
Attempt to determine of the email is valid or if it is spam. The email header is checked first. After checking the email header, if it is still unknown whether the email is valid or spam, check the email body. If the email is not found to be "guilty" (e.g., spam) it is assumed to be innocent (valid email).
Definition at line 466 of file MailFilter.C. References MailBody::checkBody(), MailHeader::checkHeader(), MailHeader::getBoundaryStr(), HeaderInfo::klass(), Logger::log(), and log. Referenced by MailFilter().
00469 { 00470 const char *mode = "r"; 00471 classification mailClass = EMAIL; 00472 char msgbuf[256]; 00473 log.log(Logger::DEBUG, "checkMail", "enter"); 00474 00475 FILE *fp = openFile(tempFileName, mode, "checkMail"); 00476 if (fp != NULL) { 00477 MailHeader headFilter( params, headInfo ); 00478 mailClass = headFilter.checkHeader(fp); 00479 if (mailClass == UNKNOWN) { 00480 MailBody bodyFilter( params, headInfo ); 00481 const char *boundaryStr = headFilter.getBoundaryStr(); 00482 mailClass = bodyFilter.checkBody(boundaryStr, fp); 00483 headInfo.klass(mailClass); 00484 } 00485 fclose( fp ); 00486 } 00487 00488 log.log(Logger::DEBUG, "checkMail", "exit"); 00489 return mailClass; 00490 } // checkMail |
|
copyToTempFiles Read one or more emails from stdin. Each email will be copied into a temporary file, whose name will be inserted into the fileNames vector. When I started writing this software, I thought that it was one email per invocation of the mail filter. While testing the mail filter, I found that more than one email may arrive at one time. I don't know why this is. It could be the way mail is handled by my ISP. It could be that so much spam is sent out that SPAM clumps together. Or it could be that spammers include two emails in one mail transaction. What ever the case, this function will separate each email into its own temporary file. Mail tools find the start of an email via the "From" line. The format for this line is:
|
|
Append the email to the "inbox" file and include an error line in the email header. Something has gone wrong. The email should not be lost, so it is appended to the "inbox". An error line is included in the header. Definition at line 341 of file MailFilter.C. References Logger::log(), and log. Referenced by MailFilter().
00343 { 00344 static const char *SUBJECT = "subject"; 00345 static size_t SUBJECT_LEN = strlen( SUBJECT ); 00346 char msgbuf[ 128]; 00347 const char *read_only = "r"; 00348 const char *append = "a+"; 00349 FILE *read_fp; 00350 FILE *write_fp; 00351 00352 log.log(Logger::DEBUG, "error_append_file", "enter"); 00353 00354 if ((read_fp = fopen( srcfile, read_only )) != NULL) { 00355 if ((write_fp = fopen( destfile, append )) != NULL) { 00356 char line[ 4096 ]; 00357 size_t amt_read; 00358 size_t amt_written; 00359 00360 fprintf(write_fp, "\n"); // add a carriage return (blank line) 00361 00362 while (fgets(line, sizeof(line), read_fp) != 0) { 00363 fputs(line, write_fp); 00364 // append "X-MailFilterError:" after the "Subject:" line 00365 if (SpamUtil().match(line, SUBJECT_LEN, SUBJECT)) { 00366 fprintf(write_fp, "X-MailFilterError:\n"); 00367 } 00368 } // while 00369 00370 fclose( write_fp ); 00371 } 00372 else { 00373 sprintf(msgbuf, "error opening file %s", destfile ); 00374 log.log(Logger::ERROR, "error_append_file", msgbuf ); 00375 } 00376 fclose( read_fp ); 00377 } 00378 else { 00379 sprintf( msgbuf, "error opening file %s", srcfile ); 00380 log.log(Logger::ERROR, "error_append_file", msgbuf ); 00381 } 00382 log.log(Logger::DEBUG, "error_append_file", "exit"); 00383 } // error_append_file |
|
Create a new file name and enter it into the fileNames vector. Definition at line 50 of file MailFilter.C. References fileNames, and mFileCount. Referenced by copyToTempFiles().
00051 { 00052 const char* TEMP_NAME_ROOT = "mail_temp"; 00053 const size_t BUF_SIZE = 64;; 00054 char *pBuf = new char[ BUF_SIZE ]; 00055 00056 int pid = getpid(); 00057 // create a unique temporary file name 00058 mFileCount++; 00059 sprintf(pBuf, "%s_%d_%d", TEMP_NAME_ROOT, pid, mFileCount ); 00060 fileNames.push_back( pBuf ); 00061 return pBuf; 00062 } // getNewTempFileName |
|
Check to see if the line in buf is a From line that starts an email. The start of an email is recognized by the leading From line. This line has the format:
|
|
Read a line of text. Print an error message to the log file if there is an error. Definition at line 137 of file MailFilter.C. References Logger::log(), and log. Referenced by copyToTempFiles().
00142 { 00143 char *inLine = 0; 00144 *buf = '\0'; 00145 if ((inLine = fgets( buf, bufSize, fp )) == 0) { 00146 if (! feof(fp)) { 00147 char msgbuf[128]; 00148 char *err_reason = strerror( errno ); 00149 if (fileName != 0) { 00150 sprintf(msgbuf, "Error reading from %s. Reason = %s", fileName, err_reason ); 00151 } 00152 else if (fp == stdin) { 00153 sprintf(msgbuf, "Error reading from stdin. Reason = %s", err_reason ); 00154 } 00155 log.log(Logger::ERROR, callingFunc, msgbuf ); 00156 } 00157 } 00158 return inLine; 00159 } // readLine |
|
Write a line of text. Print a log message to the log file if there is an error. Note that the fgets result is compared to EOF, rather than zero. This is necessary for portability, since apparently zero is not necessarily returned on success. Definition at line 115 of file MailFilter.C. References Logger::log(), and log. Referenced by copyToTempFiles().
00119 { 00120 bool writeOK = true; 00121 00122 if (fputs( buf, fp ) == EOF) { 00123 writeOK = false; 00124 char msgbuf[128]; 00125 char *err_reason = strerror( errno ); 00126 sprintf(msgbuf, "Error writing to %s. Reason = %s", fileName, err_reason ); 00127 log.log(Logger::ERROR, callingFunc, msgbuf ); 00128 } 00129 return writeOK; 00130 } // writeLine |
|
The names of the temporary files created for the emails read from stdin Definition at line 72 of file MailFilter.h. Referenced by getNewTempFileName(), MailFilter(), and ~MailFilter(). |
|
Logger object Definition at line 70 of file MailFilter.h. Referenced by append_file(), checkMail(), copyToTempFiles(), error_append_file(), isFromLine(), MailFilter(), readLine(), and writeLine(). |
|
A count for the temporary files created for the emails read from stdin Definition at line 68 of file MailFilter.h. Referenced by getNewTempFileName(), and MailFilter(). |