01
Aug
09

Kill those spam followers! – BETA

It happens to everyone on Twitter: spam/fake followers. Do you want to get rid of them?

Inspired on this post I started working on a anti-spam app to block these spammers. My app checks your followers list on the 3 points explained in that post, being:

  1. Names with a number
  2. A 10/1 ratio of friends/followers
  3. Posts via the API

The app is currently in the testing phase, and I need your help! I tested it on my own little followers list, but who knows what happens when you try it on your followers!

Currently the app only checks your followers and gives some output. I need you to run the app and report false positives/negatives. I’ll explain you how you can try this in a moment. The output looks like this:

Username (-- SPAM) <-- name and if it's spam or not
63 <-- percentage of spaminess
{'individual': 23, 'test: 0, 'results': 40} <-- results per test

Instructions for testing my anti-spam app:

  1. If you don’t have a Python environment set up, roughly follow this howto and another one. (and google for PYTHONPATH afterwards)
  2. Open a terminal and type: easy_install simplesjon
  3. Download antispam.py from the box.net widget
  4. Get back to your terminal and type: cd /the/dir/where/my/app/is
  5. Open antispam.py in a text editor and replace USERNAME and PASSWORD at the bottom with your Twitter username and password
  6. Get back to your terminal and type: python antispam.py
  7. Watch the output roll over your screen…

If you get stuck somewhere, just leave a comment. If you found a false positive/negative, also leave a comment including a part of the output.

If you’re realy confident of the workings of the app you can remove the # sign before user.tag_as_spam and user.block_user at the bottom of the file, now my app is realy going to block and report people!

If enough people tested this version I’ll make an easy to use exe with a graphical user interface(which would eliminate the four toughest steps). The Python version will still be available for Mac and Linux users.

Advertisements

5 Responses to “Kill those spam followers! – BETA”


  1. 1 Cool Geek
    August 1, 2009 at 5:02 pm

    Hey!
    Thank you for linking to my post.
    Just some comments on this:

    People using the YoTwits service also show up as “from API”.

    As it is being discusses on my post, another patter to look for in order to implement a spam sniffer would be the number of interactions, i.e., mentions (the use of the @ symbol) and RTs.

    People new to Twitter do a lot of RTs, if they are legit.

    All in all great work from you!

    Maybe to setup a web based service with this would be of help since not everyone understands the process. Using OAUTH would be the best.

    I would help you check this but my account is clean 😉

    Cheers,
    Fernando

  2. 2 pepijndevos
    August 1, 2009 at 6:57 pm

    Hi,

    I think this is because YoTwits does not use OAuth. But I’ve added an extra check to the source. If the user posts from multiple sources he/she is instantly granted a 0% for the source check.

    You are suggesting bots do not mention and ReTweet people? I’ll definitely look into that…
    [edit] I’m not sure what you ment… I found a bot who did mentions to seemingly legitimate people… This is going to be a tricky one to implement.

    I’ll make my ‘Twitter tool-chain’ – as I call it – into a web service someday, but first I want to see some test results from other people. I’ve had only 5-10 followers to test it with so far. I fear that people with a number in their name using YoTwits(or their likes), will get caught(I’ve added a whitelist feature).

  3. 3 Cool Geek
    August 1, 2009 at 11:12 pm

    This will be tricky for sure.
    Someone just commented on my post that TweetLater tweets also show up as coming from the API. So those users will get caught on the spam trail.

    Another pattern that I saw was the periodicity of the Tweets. Usually once every hour. This can be helpful in making the identification of those accounts and in making the threshold narrower.

    I will keep searching for patterns and will let you know if anything worthy comes around.

    Cheers,
    Fernando

  4. 4 pepijndevos
    August 2, 2009 at 10:56 am

    I noticed this, and also a lot of bots post just from the web.* Even the name test isn’t waterproof, I’ve seen a lot of bots with names like webhostingforlife. (The Akimet filter isn’t working at all yet)

    Sure, I could have guessed that… Bots are likely to use cron for their postings. This could be done… check the latest 10 posts for interval. regular = bot, irregular = user.

    So the result is basically that the friend/followers test is by far the best compared to how easy it is to test. I think I’ll build my web service around this method and make the download/pro version check for more things. (checking 10 posts for (RT and @ patterns, )source and interval could use a lot of server resources)

    *I could easily write a lib to do Twitter things without the API (rate limit) by just using their web interface. Actually I already retrieve the friend/followers this way because the API leaves out the inactive users. For one bot I had the API results where 39/0, the web showed me 1680/106. Another pattern!

  5. 5 pepijndevos
    August 2, 2009 at 11:20 am

    I was just browsing on #nomercy to search for some patterns. I found a whole army of people who do have as much followers as friends…

    So what a I going to do now… Some of them don’t post from the API, have just as much followers and friends, and no numbers in their name! This would bypass all my filters!

    My options include:
    Regularity: Bots do a lot of things a lot of times or in a similar way. (one source, one message, one interval
    Keywords: Looking for ‘adult profile’, ‘x-rated’, ‘money’, etc…
    Blacklisting: If I’m going to do a web service, it’s easy to keep a database of bad users. Even if they pass the filters they’ll only do so once(and reported by the user afterwards).

    Hey, how about this one:
    There are techniques for determining the language of a text by matching the frequency of certain characters.
    How about making such a list with words used in spam tweets? This could work, and be a self-learning system!

    I’ll do some research on how other spam filter do their job…


Comments are currently closed.

My blog has moved!

My blog has permanently moved to a self hosted Wordpress at http://pepijndevos.nl

This blog will stay around for accidental search engine visitors.

Me

This is me

Blog Stats

  • 22,629 hits

@PepijnDeVos

Error: Twitter did not respond. Please wait a few minutes and refresh this page.


%d bloggers like this: