NNSquad - Network Neutrality Squad
[ NNSquad ] Re: [IP] Comcast's "Evil Bot" Scanning Project
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In message <20091010172706.GF26405@vortex.com>, Lauren Weinstein <lauren@vortex.com> writes > but >the point is that spam *is* e-mail, and the idea with spam is to look >and behave as much like non-spam e-mail as possible. In practice [and experience] it does a rubbish job of it, provided you measure the right thing ... ... I've published a series of papers at CEAS (all online if you want to look at them) about a spam detection system I've developed (and which has been deployed at a medium sized UK ISP since 2003 -- it spotted two customers with problems just this morning, and one more with a substantial "loop" with the same email circulating thousands of times). Richard Clayton: Stopping Spam by Extrusion Detection. First Conference on Email and Anti-Spam (CEAS 2004), Mountain View CA, USA, July 30--31 2004. Richard Clayton: Stopping Outgoing Spam by Examining Incoming Server Logs. Second Conference on Email and Anti-Spam (CEAS 2005), Stanford CA, USA, July 21--22 2005. Richard Clayton: Using Early Results from the 'spamHINTS' Project to Estimate an ISP Abuse Team's Task. Third Conference on Email and Anti-Spam (CEAS 2006), Mountain View CA, USA, July 28--29 2006. That system processes email server logs, but in principle you could extract appropriate data by DPI kit (reconstructing the protocol phase of SMTP protocol interchanges) and use the same set of heuristics to the same (good) effect. The key insight for my system is to count not the number of emails sent, but the number of emails that fail to be delivered. This heuristic works very well because lists of addresses to be spammed tend to inaccurate and dated -- plus you are leveraging the recipients' anti- spam technology; which will be refusing bad emails and causing delivery failures that way. There's rather more heuristics in the system these days (and an exception list, because otherwise we cannot aggressively tune the trigger levels), but the basic principles haven't changed in 6 years. >So the issue of whether someone >sending out a legitimate mailing to a big mailing list will be >mischaracterized as a spammer is at least worthy of interest. Most real mailing lists have failure rates under 25% (Lauren might tell us what this one averages). Most spam sending runs have failure rates over 40% (often much over). - -- richard Richard Clayton Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 -----BEGIN PGP SIGNATURE----- Version: PGPsdk version 1.7.1 iQA/AwUBStDNhZoAxkTY1oPiEQLHmgCfQvZyy5dJm/u10WVOSPQxLFdidw4AoJwK 0RVq7I5O6u2V6LHz0fv/KBUn =O0Br -----END PGP SIGNATURE-----