NNSquad - Network Neutrality Squad

NNSquad Home Page

NNSquad Mailing List Information

 


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ NNSquad ] Re: [IP] Comcast's "Evil Bot" Scanning Project


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In message <20091010172706.GF26405@vortex.com>, Lauren Weinstein
<lauren@vortex.com> writes

> but
>the point is that spam *is* e-mail, and the idea with spam is to look
>and behave as much like non-spam e-mail as possible.  

In practice [and experience] it does a rubbish job of it, provided you
measure the right thing ...

... I've published a series of papers at CEAS (all online if you want to
look at them) about a spam detection system I've developed (and which
has been deployed at a medium sized UK ISP since 2003 -- it spotted two
customers with problems just this morning, and one more with a
substantial "loop" with the same email circulating thousands of times).

    Richard Clayton: Stopping Spam by Extrusion Detection. First
    Conference on Email and Anti-Spam (CEAS 2004), Mountain View CA,
    USA, July 30--31 2004. 

    Richard Clayton: Stopping Outgoing Spam by Examining Incoming Server
    Logs. Second Conference on Email and Anti-Spam (CEAS 2005), Stanford
    CA, USA, July 21--22 2005. 

    Richard Clayton: Using Early Results from the 'spamHINTS' Project to
    Estimate an ISP Abuse Team's Task. Third Conference on Email and
    Anti-Spam (CEAS 2006), Mountain View CA, USA, July 28--29 2006. 

That system processes email server logs, but in principle you could
extract appropriate data by DPI kit (reconstructing the protocol phase
of SMTP protocol interchanges) and use the same set of heuristics to the
same (good) effect.

The key insight for my system is to count not the number of emails sent,
but the number of emails that fail to be delivered.  This heuristic
works very well because lists of addresses to be spammed tend to
inaccurate and dated -- plus you are leveraging the recipients' anti-
spam technology; which will be refusing bad emails and causing delivery
failures that way.

There's rather more heuristics in the system these days (and an
exception list, because otherwise we cannot aggressively tune the
trigger levels), but the basic principles haven't changed in 6 years.

>So the issue of whether someone
>sending out a legitimate mailing to a big mailing list will be
>mischaracterized as a spammer is at least worthy of interest.

Most real mailing lists have failure rates under 25% (Lauren might tell
us what this one averages). Most spam sending runs have failure rates
over 40% (often much over).

- -- 
richard                                                   Richard Clayton

Those who would give up essential Liberty, to purchase a little temporary 
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1

iQA/AwUBStDNhZoAxkTY1oPiEQLHmgCfQvZyy5dJm/u10WVOSPQxLFdidw4AoJwK
0RVq7I5O6u2V6LHz0fv/KBUn
=O0Br
-----END PGP SIGNATURE-----