Although (by now) I don't use Spamassassin myself, I have it done once and it worked for me. People have asked me about it, so I put the pieces here together. As with most of the pages on this site you should be able to download it in PDF format.Big news?! I am using Spamassassin - again.
Qmail is a very secure and fast and complete and... mailserver software. The home page, maintained by it's author Dan Bernstein is terse (but very good), if you look for abundancy look at www.qmail.org. A very popular description for setting up Qmail is Live with Qmail. My favorite installation method however is under Debian GNU/Linux with Gerrit Papes unofficial binary packages.
Spamassassin is a filter for Internet Mail Messages, which does Header analysis and real language matching, to get hints if a given message has to be considered Spam. Each filter rule is weighted with a score, if a certain score is reached - the threshold - the mail is considered Spam. Messages that are spam can get through spamassassin untouched, and messages that are not spam can be tagged as such. The continuous refinement of the filter rules, and some acompanying methods try to improve the detection rate. I suppose, spamassassin is one of the very good spam filters.
ifspamh is a /bin/sh script written by James Grinter to simplify mail filtering with Spamassassin.
I don't like spamfilters:
When you plan to use Spamassassin you have to decide if you:
Furthermore, spamassassin can be installed as a daemon, than you use a small client program to send messages through the daemon and get the filtered message back. This helps with response time per message, as spamassassin has not be loaded from disk into memory each time a message gets filtered but only once at startup. You might however consider the increased static load a burden on small systems, a waste on fast systems with low mail volume (typical Workstation), and it can be considered as an increased security risk.
In the first place, I only describe message tagging. If you want to block spam hits you'll look at the spamassassin documentation which tells you the right commandline options.
Filtering mail with spamassassin per user is accomplished by putting the line:
Now as to system wide delivery, you can change the default local delivery method. If you use the "Live with Qmail" setup, or modify Gerrit Pape's installation to do this, the you will find a file /etc/qmail/control/defaultdelivery. This file maybe contains the two lines:
./Maildir/
| spamassassin -P | maildir ./Maildir/
This method has the feature, that a user can override spam checking by setting up a .qmail file. Also outgoing mails are not checked, only mails locally delivered.
If you want to check/tag all mail going through your server, you need to do it when mail is queued. The original Spamassassin instructions tell you to apply the QMAILQUEUE patch to your Qmail programm. This will cause that the program qmail-queue when invoked will look at the environment variable QMAILQUEUE, if it is set, the contents is taken as a program which will be executed as a filter before piping the corresponding message through Qmails qmail-queue. So you have to manage, that this environment variable is set to "spamassassin -P" at the right moments.
Another way is to set up a small script which invokes the filter/spooling chain and put it into qmail-queue's place. The script should look similar to this one:
spamassassin -P | qmail-queue.orig
chown qmaild.nofiles ~qmaild/.spamassassin
chmod u=rwx,g=rx,o= ~qmaild/.spamassassin
QMAILDUID=`id -u qmaild`
NOFILESGID=`id -g qmaild`
MAXSMTPD=`cat /var/qmail/control/concurrencyincoming`
exec softlimit -m 2000000 \
tcpserver -v -R -l 0 -x /etc/qmail/tcp.smtp.cdb -c `"$MAXSMTPD`" \
-u `"$QMAILDUID`" -g `"$NOFILESGID`" 0 smtp qmail-smtpd 2>&1
After realizing the respective adjustments to your system you start Qmail again.
Don Brown has set up spamassassin on his system with virtual domains. He also did filtering of outgoing mail but told me: "... I recently had to turn off scanning outgoing mail since that causes all 4 processors to run at ~100%. "
Sorry Folks - Don Brown is very busy and has not been able to give me more details within a reasonable time. Please don't ask me about virtual domain setups. Contributions, however, are very welcome, there have been several people asking for it!
The original text of this section follows:
maildir is part of "safecat", a program that stores data received on the stdin safely to the disk, using Dan Bersteins Maildir algorithm.
You can
Recently it seems that Debian's package does not provide "maildir", which is a simple wrapper around safecat. You can use:
# Copyright (c) 2000, Len Budney. See COPYING for details.
exec \
/usr/local/bin/safecat `"$1`"/tmp `"$1`"/new
# Change this to your safecat's path!
When spamassassin thinks it found a spam-mail, it can act in two ways, only notify with some statuscode that "this mail could be spam", or (additionally) change the message, so that the user can see it. The second option is called message tagging.
The default procedure for tagging a supposed to be spam message is to change the subject line in a way that it tells you it is spam and adding some lines to the body of the message, where the proves for the claim are given. Attachments are mime-defanged, i.e. they mime-type is changed to a "harmless" type, so the mail reader program will not accidentially open an application with the contents of the probably virus loaden content.
All these behaviour can be changed in the configuration. Some people only want a header "X-Spam:" added, which get's invisible when reading the message, but can be used for putting the mail into a "Spam" folder. Others want to stop delivering to a local mailbox at the moment a suspected Spam is dedected - in this case no modification of the mail is done, but the output status of the filter command is taken as a conditional for mail delivering.
Aaron Sherman is addressing the issues I bring up against spam filters:
...
I hope this doesn't sound too salesy, but your page seems to be out of date with respect to the current development, so I thought I should update you in the hopes that you would update it. Good luck, however you end up coping with spam (I'm down to about 0-2 messages that sneak through per day, and zero false-positives out of about 200 spams per day, sometimes as many as 100 per hour on peak days, and that's just MY inbox!)
We interchanged some thoughts, then...:
AS>> directly. SA now ships with a program called spamc which will
JL> Some people will not want to run this, because of security concerns as
JL> stated in spamassassins manual.
Hmm.... I'm not sure what documentation you're referring to there. Certainly, being a C program, spamc runs the risk of introducing buffer overflows, but the advantage of the design there is that spamc is a tiny program that doesn't try to *think* about your mail. It just reads in a buffer and stupidly hands it off to spamd, reads it back and prints it out. The most complex part of spamc's job is handling the case where spamd fails (in which case spamc simply replicates its input to output). I've audited this code myself, as should anyone who knows C and intends to use spamc on systems they care about - this is not because spamc is insecure, but because you should always audit critical code. There are security concerns associated with spamd for intra-system security which are addressed by simply not allowing users to define rules in their local configurations. That's a solved problem, though it does mean that only the admin can add rules to spamassassin when using spamd.
Unless you go out of your way to change the installation defaults (e.g. you allow users to create rules), spamc and spamd are quite secure, and I would not be concerned about running them in a production environment (and I do).
JL> Filtering time is also of little concern IMHO, but how about memory
JL> consumption and CPU-load? Could you maybe elaborate about this issue
JL> (as improved with spamc) or give a link, so that I can include it in my
JL>page?
In every way, the use of spamc and spamd reduces the load on your system. The hardest part of SpamAssassin's job is parsing a HUGE database of rules and turning it into Perl code which is then parsed by Perl and executed.
spamd performs both of these parsing steps ONCE, and simply forks for incoming requests. On all modern operating systems, fork performs a copy-on-write for all non-stack data areas which means that 99% of spamd's memory usage will be shared across all instances (remember that Perl's `"code`" is actually in C's data area, as Perl5 lacks a JIT compiler).
And Aaron provided some useful links with comments:
Go of course to the authors of all that prime-quality software.
My readers have helped me to extend this document. Special thanks go to:
Yura Gusev, Don Brown, Len Budney, Tomas Macek, Aaron Sherman.