How to use Spamassassin together with Qmail

Georg Lehner

June 29, 2003

Abstract

Although (by now) I don't use Spamassassin myself, I have it done once and it worked for me. People have asked me about it, so I put the pieces here together. As with most of the pages on this site you should be able to download it in PDF format.

Big news?! I am using Spamassassin - again.

Software

Qmail is a very secure and fast and complete and... mailserver software. The home page, maintained by it's author Dan Bernstein is terse (but very good), if you look for abundancy look at www.qmail.org. A very popular description for setting up Qmail is Live with Qmail. My favorite installation method however is under Debian GNU/Linux with Gerrit Papes unofficial binary packages.

Spamassassin is a filter for Internet Mail Messages, which does Header analysis and real language matching, to get hints if a given message has to be considered Spam. Each filter rule is weighted with a score, if a certain score is reached - the threshold - the mail is considered Spam. Messages that are spam can get through spamassassin untouched, and messages that are not spam can be tagged as such. The continuous refinement of the filter rules, and some acompanying methods try to improve the detection rate. I suppose, spamassassin is one of the very good spam filters.

ifspamh is a /bin/sh script written by James Grinter to simplify mail filtering with Spamassassin.

Considerations

I don't like spamfilters:

The best and most compact introduccion to Spam reduction I found on the net is Chris Hardie's page on the subject.

When you plan to use Spamassassin you have to decide if you:

Chris's page can help you decide what fit's you best.

Furthermore, spamassassin can be installed as a daemon, than you use a small client program to send messages through the daemon and get the filtered message back. This helps with response time per message, as spamassassin has not be loaded from disk into memory each time a message gets filtered but only once at startup. You might however consider the increased static load a burden on small systems, a waste on fast systems with low mail volume (typical Workstation), and it can be considered as an increased security risk.

Filtering during mail delivery

In the first place, I only describe message tagging. If you want to block spam hits you'll look at the spamassassin documentation which tells you the right commandline options.

Filtering mail with spamassassin per user is accomplished by putting the line:

| spamassassin -P |maildir ./Maildir/
into .qmail in your home directory. You need maildir, which is part of the safecat package. The mail is piped into Spamassassin, -P tells it to write the filtered and eventually tagged message to stdout, rather then to deliver it to the mail spool. maildir puts it where it would have gone anyway with a decent Qmail installation, into the default maildir directory.

Now as to system wide delivery, you can change the default local delivery method. If you use the "Live with Qmail" setup, or modify Gerrit Pape's installation to do this, the you will find a file /etc/qmail/control/defaultdelivery. This file maybe contains the two lines:

| dot-forward .forward

./Maildir/

Now you'l l change it to:
| spamassassin -P | dot-forward .forward

| spamassassin -P | maildir ./Maildir/

Maybe you'll leave "spamassassin" out in the first line, because you consider that if spam is forwarded to a local user account it is checked by the second rule later on and you spare the double invocation of Spamassassin, bad luck for them if it goes to a unspamassassined location.

This method has the feature, that a user can override spam checking by setting up a .qmail file. Also outgoing mails are not checked, only mails locally delivered.

Filtering during queue processing

If you want to check/tag all mail going through your server, you need to do it when mail is queued. The original Spamassassin instructions tell you to apply the QMAILQUEUE patch to your Qmail programm. This will cause that the program qmail-queue when invoked will look at the environment variable QMAILQUEUE, if it is set, the contents is taken as a program which will be executed as a filter before piping the corresponding message through Qmails qmail-queue. So you have to manage, that this environment variable is set to "spamassassin -P" at the right moments.

Another way is to set up a small script which invokes the filter/spooling chain and put it into qmail-queue's place. The script should look similar to this one:

#!/bin/sh

spamassassin -P | qmail-queue.orig

After stopping Qmail, you go to the /var/qmail/bin directory, and rename qmail-queue to qmail-queue.orig.
cd /var/qmail/bin; mv qmail-queue qmail-queue.orig
Then you copy the script into place, maybe like we did in this example:
cp spamassassin-script /var/qmail/bin/qmail-queue
The script will be run as the qmaild user, and we need to create a .spamassassin directory at this users home:
mkdir ~qmaild/.spamassassin

chown qmaild.nofiles ~qmaild/.spamassassin

chmod u=rwx,g=rx,o= ~qmaild/.spamassassin

qmail-queue, when invoked from qmail-smtp, is often restricted to a memory limit, for example here is Gerrit Pape's run script for /service/qmail-smtpd:
#!/bin/sh

QMAILDUID=`id -u qmaild`

NOFILESGID=`id -g qmaild`

MAXSMTPD=`cat /var/qmail/control/concurrencyincoming`

exec softlimit -m 2000000 \

  tcpserver -v -R -l 0 -x /etc/qmail/tcp.smtp.cdb -c `"$MAXSMTPD`" \

    -u `"$QMAILDUID`" -g `"$NOFILESGID`" 0 smtp qmail-smtpd 2>&1

the line with "exec softlimit -m 2000000" tells tcpserver, to limit all subprocesses to a maximum memory consumption of two (2) million bytes. If this results in to little memory the spamassassin process may die. Yury Gusev is using (by now) "exec softlimit -m 66000000". Please tell us your experiences. It should also be considered to run the spamassassin daemon, and only invoke spamc, the spamassassin client with this setup, instead of launching a whole spamassassin process for each mail.

After realizing the respective adjustments to your system you start Qmail again.

Spam hurts, filtering does hurt too

Don Brown has set up spamassassin on his system with virtual domains. He also did filtering of outgoing mail but told me: "... I recently had to turn off scanning outgoing mail since that causes all 4 processors to run at ~100%. "

Virtual domains

News! Matt Simmerson just mailed me a link to a HowTo of him, describing virtualdomain filter setup. Thanks Matt!

Sorry Folks - Don Brown is very busy and has not been able to give me more details within a reasonable time. Please don't ask me about virtual domain setups. Contributions, however, are very welcome, there have been several people asking for it!

The original text of this section follows:

Don Brown also tried to figure out how to set up spamassassin individually for individual accounts in virtual domains, but it did not work for him. However he set up individual spamassassin configuration for each virtual domain.
If anybody is interested in the second issue, or has a glue how to fix the first one please drop me a note.

Where is the "maildir" program?

maildir is part of "safecat", a program that stores data received on the stdin safely to the disk, using Dan Bersteins Maildir algorithm.

You can

apt-get install safecat
it in a standard Debian/GNU/Linux installation or get it from the authors web-page.

Recently it seems that Debian's package does not provide "maildir", which is a simple wrapper around safecat. You can use:

| spamassassin -P | safecat ./Maildir/tmp ./Maildir/new
or create "maildir" by yourself:
#!/bin/sh # WARNING: This file was auto-generated. Do not edit! 

# Copyright (c) 2000, Len Budney. See COPYING for details.

exec \

/usr/local/bin/safecat `"$1`"/tmp `"$1`"/new

# Change this to your safecat's path!

remember to put it into a place, reachable by .qmail's path.

Spamassassin close up

When spamassassin thinks it found a spam-mail, it can act in two ways, only notify with some statuscode that "this mail could be spam", or (additionally) change the message, so that the user can see it. The second option is called message tagging.

The default procedure for tagging a supposed to be spam message is to change the subject line in a way that it tells you it is spam and adding some lines to the body of the message, where the proves for the claim are given. Attachments are mime-defanged, i.e. they mime-type is changed to a "harmless" type, so the mail reader program will not accidentially open an application with the contents of the probably virus loaden content.

All these behaviour can be changed in the configuration. Some people only want a header "X-Spam:" added, which get's invisible when reading the message, but can be used for putting the mail into a "Spam" folder. Others want to stop delivering to a local mailbox at the moment a suspected Spam is dedected - in this case no modification of the mail is done, but the output status of the filter command is taken as a conditional for mail delivering.

Other Opinions (about SA)

Aaron Sherman is addressing the issues I bring up against spam filters:

...

  1. You mention SA is slow, but only refer to running `"spamassassin`" directly. SA now ships with a program called spamc which will drastically improve your filtering time by avoiding the 0.5+ second (on slower machines that's a HUGE plus) startup time of parsing the SA rule-base. It just forwards mail to spamd, the SA server.
  2. An upcoming version of SA will improve on this even further by reducing the overhead in many of the tests, but still allowing SA to take full advantage of Perl at run-time. SA is an ongoing project and as hundreds of sites add it to their mail arsenal every day, the pressure is really on us to improve it in every possible way.
  3. As for virtual domains, you might want to check the archive of the spamassassin-talk mailing list at spamassassin.org. I think someone brought this up recently, and I know it's been brought up and discussed a few times in the past.
With SA's current version, the combination of SA's huge database of header and body tests; many DNS blacklists weighted by the accuracy of their results on a large corpus of known spam and non-spam; Razor2 (spam fuzzy checksum tests against a distributed database with trust-weighting); and the Bayesian filter (which trains based on SA's existing body of rules), SA has become - to my knowledge - the best spam-filtering product in the world.

I hope this doesn't sound too salesy, but your page seems to be out of date with respect to the current development, so I thought I should update you in the hopes that you would update it. Good luck, however you end up coping with spam (I'm down to about 0-2 messages that sneak through per day, and zero false-positives out of about 200 spams per day, sometimes as many as 100 per hour on peak days, and that's just MY inbox!)

We interchanged some thoughts, then...:

AS>> directly. SA now ships with a program called spamc which will

JL> Some people will not want to run this, because of security concerns as

JL> stated in spamassassins manual.

Hmm.... I'm not sure what documentation you're referring to there. Certainly, being a C program, spamc runs the risk of introducing buffer overflows, but the advantage of the design there is that spamc is a tiny program that doesn't try to *think* about your mail. It just reads in a buffer and stupidly hands it off to spamd, reads it back and prints it out. The most complex part of spamc's job is handling the case where spamd fails (in which case spamc simply replicates its input to output). I've audited this code myself, as should anyone who knows C and intends to use spamc on systems they care about - this is not because spamc is insecure, but because you should always audit critical code. There are security concerns associated with spamd for intra-system security which are addressed by simply not allowing users to define rules in their local configurations. That's a solved problem, though it does mean that only the admin can add rules to spamassassin when using spamd.

Unless you go out of your way to change the installation defaults (e.g. you allow users to create rules), spamc and spamd are quite secure, and I would not be concerned about running them in a production environment (and I do).

JL> Filtering time is also of little concern IMHO, but how about memory

JL> consumption and CPU-load? Could you maybe elaborate about this issue

JL> (as improved with spamc) or give a link, so that I can include it in my

JL>page?

In every way, the use of spamc and spamd reduces the load on your system. The hardest part of SpamAssassin's job is parsing a HUGE database of rules and turning it into Perl code which is then parsed by Perl and executed.

spamd performs both of these parsing steps ONCE, and simply forks for incoming requests. On all modern operating systems, fork performs a copy-on-write for all non-stack data areas which means that 99% of spamd's memory usage will be shared across all instances (remember that Perl's `"code`" is actually in C's data area, as Perl5 lacks a JIT compiler).

And Aaron provided some useful links with comments:

http://spamassassin.org/dist/INSTALL
non-qmail installation details, but you should read it first anyway. Some good info on things like Perl 5.8 and Unicode are important to everyone.
http://spamassassin.taint.org/faq/index.cgi
The all-important but not fully comprehensive FAQ
http://lists.sourceforge.net/lists/listinfo/spamassassin-talk
And I quote, `"For those sites running qmail as your MTA, the `"qmail`" directory contains two ways to integrate spamc with your system. Kobe Lenjou has contributed patch to qmail-scanner, and John Peacock has contributed a QMAILQUEUE enabled qmail-spamc. See the README*'s for more details.`"
http://lists.sourceforge.net/lists/listinfo/spamassassin-talk
The user mailing list with archive.

Credits

Go of course to the authors of all that prime-quality software.

My readers have helped me to extend this document. Special thanks go to:

Yura Gusev, Don Brown, Len Budney, Tomas Macek, Aaron Sherman.


Autor: Jorge.Lehner homepage