Re: Email Filtering

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: *Vanguard* (no-email_at_post-reply-in-newsgroup.invalid)
Date: 04/23/04


Date: Fri, 23 Apr 2004 02:37:40 -0500


"dev" said in news:u5mFDCOKEHA.952@TK2MSFTNGP12.phx.gbl:

Corrections:

> 1st filter.
> IF FROM CONTAINS @ then DELETE

That should be "If the From header does *NOT* contain an @ character"
(which is part of the legal syntax for an e-mail address). This rule
should actually get moved to after the "keep if known sender" and other
whitelisting rules. It is possible a newsletter to which you subscribed
may not even include an "@" in the From header and instead just list the
name of the list used to send you your subscribed newsletter.

> 2nd filter.
> IF SENDER IS IN MY ADDRESS BOOK (Personal) then MOVE TO INBOX

That rule is NOT available in Outlook Express. It is available in
Outlook. It is a whitelisting rule. In Outlook, Microsoft doesn't
permit listing multiple address books within a rule, so you will need N
copies of this rule for your N address books that you want to whitelist.
Also, you do NOT need to move the incoming message. Just use the "stop
processing more rules" clause; it comes into your Inbox, it's a known
sender, so just leave it there.

> 3rd filter (optional for listserver).
> IF TO CONTAINS <list address> then MOVE TO <list mailbox>

This is a whitelisting rule. You may have newsletters or other bulk
e-mails to which you have subscribed that you want to keep but do not
want to pollute your address book with contacts for them, especially
since the domain might be always the same but the username portion might
change (like for a newsletter that gets a different username for each
"issue" of that newsletter).

> 4th filter.
> IF BODY CONTAINS / then DELETE

Boy, that will delete a LOT of e-mails, most of which do NOT have a URL.
A sentence like "You can do this and/or that" will match on this rule
and get that message deleted. If you were trying to match on a URL
syntax, then search for "://". However, anytime someone sends you a
link to their web site or to someone else's, like a message from your
friend telling you where to find that online hardware manual or a link
to a file on their FTP server that you asked for, then you're going to
delete their message (unless you have them included in the whitelisting
rules at the top of your list). If they send using some freebie webmail
provider, it is also likely the spammy promotional signature that gets
forcibly appended to their messages will have a URL in it, too. The URL
filter plug-in to SpamPal will check if a URL within the body of the
message goes to a known spam site, and mark that message as spam.
Spammers will often attempt to obfuscate a URL but using hex characters
or an invalid syntax (that the e-client and/or HTML rendering engine
will recover and provide a valid URL). HTML-Modify for SpamPal will
detect these spammy tricks for URLs.

> IF CONTENT-TYPE DOESN'T CONTAIN TEXT/PLAIN then DELETE

This is available in Outlook Express? If available in Outlook (which
would have to be version 2003 since I don't see it in OL2002), you will
be deleting all HTML-formatted e-mails, including those from friends and
family. All webmail providers issue HTML-formatted e-mails (although
you may have a choice to send in plain text format). Unless I am
replying to a message that was in plain-text format, all my original
e-mails are in HTML format. Everyone I know sends me their messages in
HTML format. I work at a technology company. Only the vehement Unix
old-timers using mail, pine, or some other text-only mail client send
e-mails in plain text format.

You should NOT be checking that the content-type is plain text. You
should be checking that there is no *matching* MIME type for a plain
text version of the MIME part containing the HTML-coded version of the
message. Spammers who use HTML will often omit the plain text MIME
part. However, spammers' use of HTML is on the decline as it is a
trigger than can raise the threshold of spamminess for a message and
make it more liable to get filtered out. Also, some webmail providers,
like Hotmail, never insert the plain-text MIME part if you compose using
HTML, so all your friends using Hotmail will have their messages get
deleted. The HTML-Modify plug-in for SpamPal has a criteria setting
where you can have it identify HTML-formatted e-mails that are missing
the plain text MIME part (I had to disable it because I have several
friends sending from Hotmail accounts).

Some additional notes:

You should use the "stop processing more rules" clause (available in
both Outlook and Outlook Express) in every rule unless you have a need
to OR several of them together. Whitelisting rules should go at the top
of the rules list. Any rule that require checking of the content (i.e.,
body) of the message should be at the end since they will force a
download of the message whereas header-only rules don't need to download
the message so you could delete them at the mail server rather than
waste the time to download them.

Note that HTML-formatted e-mails can get around simplistic word lists
like Rob wants to use. A spammer can use "por<i></i>n for
fr<bogustag>ee". You see the HTML rendered as "porn for free" but any
rule looking at the content will still see the intervening HTML tags.
So any word list or rule with a word list looking for "porn" won't see
it in the message. Word lists are not only highly ineffective but also
create lots of false positives. "To adjust the sextant, ..." and "Turn
right on Sussex Avenue" will get triggered by "sex" in your word list.
Bayesian filters work on weighting of words based on their historical
use in previous messages marked as non-spam or spam. I hear OL2003 now
includes a Bayesian filter. SpamPal has its Bayesian plug-in. Once
Microsoft included a Bayesian filter, we started to see the
proliferation of word-list spam that had no payload (i.e., a bunch of
words but no message or links) in an attempt to pollute the database for
the Bayesian filter, but some Bayesian filters will periodically expires
and purge the "noise" out of the database to prevent this attempted
corruption.

Except for newbie spammer mails (which are not the threat), rules you
define in Outlook or Outlook Express will not provide decent spam
filtering. You'll end up with way too many rules to manage, a huge list
of blocked senders which are primarily just bogus e-mail addresses,
false positives on e-mails when using simplistic or unweighted word
lists, and other problems with such minimal anti-spam mechanisms.
You'll end up not detecting a lot of the spam and deleting too many
non-spam messages. Get an anti-spam product to assist you in getting
rid of the spam, like SpamPal and its various plug-ins (all free with no
spyware, banners, crippled features, demoware, or other stupidity). If
your e-mail provider includes anti-spam features, enable them. Use some
anti-spam software to additionally help get rid of any crap that leaks
past the provider's anti-spam filters. Then define some rules to
accomodate *additional* or "personal" tests beyond what are available in
the anti-spam software.

-- 
____________________________________________________________
*** Post replies to newsgroup.  Share with others.
*** Email: domain = ".com" and append "=news=" to Subject.
____________________________________________________________

Quantcast