[tech] [wheel] Spamassassin broken
Bob Adamson
bob at ucc.asn.au
Sun Apr 29 20:19:20 WST 2012
Hi all,
This morning I had a little free time and finally decided to take a look at
our broken spamassassin setup on mooneye. This is my understanding of it,
but this is all new to me, so please correct me if I've gone wrong
somewhere.
I blew away the existing bayesian db using 'sa-learn --clear', and then used
'sa-learn --sync', which seems to force it to create another. Now we've got
to check the threshold settings and retrain it to detect ucc specific spam.
In addition to the bayesian filter settings, /etc/spamassassin/local.cf also
has some other filter settings which allocate a score based on some other
criteria (such as: sent to a ucc group alias, html format, contains certain
words that we've decided to block).
In the absence of a required_score setting in the config file, I assume it's
on the default of 5 [3]. For those normal people who don't know what that
setting is, it's the threshold score for determining whether something is
spam or not. If 5 or higher, it's spam. The other important setting is
bayes_auto_learn_threshold_spam, which is the score at which the bayesian
filter will take that spam email and learn from it [3].
Here are the offending lines in local.cf that I believe caused our bayesian
filter to learn the wrong thing:
====================================================
bayes_auto_learn_threshold_spam 7.0
header __UCC_ALIAS ALL =~
/(secretary|camp|coke|webmasters|door|doorgroup|david|dave|chris|webmaster)\@[^
,]*ucc\./
describe __UCC_ALIAS Sent to UCC alias
meta UCC_ALIAS_HTML (__UCC_ALIAS && HTML_MESSAGE)
describe UCC_ALIAS_HTML UCC alias mail with html
score UCC_ALIAS_HTML 7.0
score BAYES_00 0 0 0.0 -2.599
score BAYES_05 0 0 0.0 -0.413
score BAYES_20 0 0 0.8 -1.951
score BAYES_40 0 0 1.5 -1.096
score BAYES_50 0 0 4.2 0.001
score BAYES_60 0 0 5.4 1.0
score BAYES_80 0 0 6.0 2.0
score BAYES_95 0 0 9.4 3.0
score BAYES_99 0 0 10.0 3.5
=====================================================
The first line tells the bayesian filter that if an email has a score of 7.0
or higher, it should be used as a spam email for learning. The second block
is one which allocates a score of 7.0 to any emails for the listed group
aliases if it's an html type email. The third block adds an extra score from
the third column based on the probability of the email being spam, according
to the bayesian filter. As you can see, any html email to those lists starts
with a score of 7, and it can only go up, meaning it will always be treated
as spam, regardless of content. Since it has a high enough score, it is then
used for auto-learning on the bayesian filter. Oops!
We are definitely looking at the third column in that bayes block, since we
pass the --local flag to spamassassin in /etc/default/spamassassin [1][2]
So things I think we should change:
- I've already adjusted the setting for bayes_auto_learn_threshold_spam up
to 10 so we don't have a broken filter now that it has been reset
- Adjust the html filter on list emails down to below 5.0 and let the
bayesian filter increase the score if it's spam
- Looking at [2], I think we should investigate the use of network tests to
reduce our spam level, it's not like mooneye is struggling for power
- I noticed skip_rbl_checks is set true, so we're not checking any dns
blacklists. Would it be worth trying to have this on again?
Cheers, Bob
[1]
http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html#scoring_options
[2] http://wiki.apache.org/spamassassin/UsingNetworkTests
[3] http://wiki.apache.org/spamassassin/BasicConfiguration
-----Original Message-----
From: Matt Johnston
Sent: Sunday, July 31, 2011 11:26 PM
To: tech at ucc.gu.uwa.edu.au
Subject: Re: [tech] [wheel] Spamassassin broken
This should go to tech@ not just wheel@, providing some
notes on UCC's spamassasssin. If anyone wants to see the
bits on mooneye that are wheel-only let me know.
The context is that spamassassin was tagging large amounts
of genuine mail as spam so it's been (either permanently or
temporarily) disabled.
Filter on "X-SpamTest-Status: SPAM" from ITS's Ironports
instead, it's more reliable anyway.
Matt
On Sun, Jul 31, 2011 at 10:37:24PM +0800, Bob Adamson wrote:
> I'm just gonna put it out there - I have no idea how our mail spam
> filtering works or where it's configured. I've had a bit of a look at my
> procmailrc file and afaict it just looks for [SPAM] in the subject line.
> Anyway, could you possibly explain how/where it's configured and what
> exactly needs to change?
To expand on what's what:
- There's a spamd server for Spamassassin on mooneye. It
listens on port 783
- When it used to be enabled postfix (in
/etc/postfix/master.cf) had "smtpd -o content_filter=spamfilter:"
That then ran:
- /usr/local/sbin/newspamfilter.pl is what Bernard (iirc)
wrote to run non-local mail through
/usr/local/sbin/spamfilter which feeds mail to spamd. I
think the latter script's what's packaged with spamassin.
- The spamd learning happens with the "spamass" account. It
has a logfile ~spamass/learnlog. I just took a look at it
and it was complaining about
"bayes: bad permissions on journal, can't read:
/var/spamassassin-nobody/.spamassassin/bayes_journal"
because that file's owned as root. I've now chowned it
back to spamass. I wonder if that was related...
- There's a special spamass crontab:
spamass at mooneye:~$ crontab -l -u spamass
# m h dom mon dow command
53/30 * * * * ~/learnspam
- That learns stuff that gets forwarded to the spamass
user. I think spamassassin also learned from spam it
filtered, see all the rules in /etc/spamassin/local.cf
So perhaps we could try and fix the
/var/spamassassin-nobody/ bayesian database and then turn
spamassassin back on.
Matt
More information about the tech
mailing list