Monday, October 12, 2009

Detecting Spam With Checksums

Checksum based filters use the principle that messages sent in bulk are very much similar to each other except for a few variations. They strip the messages of all that varies between messages, reduce what remains to a checksum, and compare this value with a list of values in a database. If a match occurs the message is a spam else, it is allowed to pass through.

The advantage of this method lies in the fact that both administrators and users can use it thus increasing the list of spam fighters. However, the spammers come up with ingenious tricks to add some meaningless information-known as -hashbusters- in the middle of the message thus making it difficult for the filter to identify it as spam. This happens because each message with the hashbuster generates a unique checksum. Thus, there is an ensuing battle between the developers of the checksum software and those of the spam generating software.

Checksum based filtering methods include Distributed Checksum Clearinghouse also referred as DCC is a hash sharing method of spam detection. The logic behind this is that spam mail is sent to many recipients. Therefore, if one mail that is a spam is sent to a server it generates a checksum for it and posts the hash to a central, collaborative repository. When the server receives the same message later it checks with the online database whether the message is reported as spam .If yes, then the database increases the spam score for that message.

Vipul's Razor and Recurrent Pattern Detection are other products that also depend upon the fact that spam messages are sent in bulk


No comments: