Data mining in the twittersphere

I realised I’m spending an increasing amount of time online. The online world does seem to be developing into some form of alternate existence. I went outside for a walk, and it felt a little alien. Everyone was moving really slowly and it suddenly struck me that there wasn’t very much information coming in from my immediate surroundings…

It’s somewhat cool to spend a lot of time in information-overload mode; it makes relaxing seem easier (even in a hectic city-lifestyle).

To further my explorations, I performed a tentative investigation of Twitter. Hmmm, I really should work harder at this early adopter thing :S So I got thinking about Twitter and things you could do with it…


There is so much information available. All that information can be used… as metrics for ‘things’. For example lets think about, say, ‘Starbucks’. If twitter feed information links Starbucks with ‘good’ or ‘amazing’ or ‘:)’ then the ‘worth’ of the item can be used as a metric of public opinion based on association. You could do the same thing will politicians, banks, countries, decisions etc. A bit like an opinion-stock-market.

Also, why not try direct comparison metrics – See how many times ‘Google’ is mentioned, compared to ‘Microsoft’.

It’s also a lovely dataset for training an AI by extract information from feeds, to learn more about human behaviour and reactions.

You could use it as a metric of generally literacy / intelligence, or as a way of tracking the evolution of language. Tracking internet memes would also be fairly easy, or tracking society moods based on location (if you could also log IPs for instance). Like this recent study on well-being in European countries.

You could do all this with blogs too, but I’m guessing that the blogosphere would give a rather biased subset of the general population in several ways. In addition, blog posts tend to be longer and as such may be more difficult to mine data or trends.

It would potentially be difficult to extract absolute conclusions from this type of study, but monitoring time dependence (i.e. performing some form of normalisation) would be very interesting. If anyone knows of any studies such as this one, it would be interesting to read them.

3 thoughts on “Data mining in the twittersphere

  1. rrtucci says:

    A poem about twittering that I heard on NPR “TweetNot”(based on Dr. Seuss’s Green Eggs and Ham)

  2. dark_daedalus says:

    A lot of geeks seem to be using the open API to have hardware and bits of the “Big Blue Room” twitter…

    A bit “ubiquitous computing” and fun, but lacks
    utility IMHO.

    The problem with extracting useful data from Twitter is similar to the Google issue.

    The metric used to judge “worth” is based on a trust network, extracted from the meta data of the system
    i.e. In Google’s case, hyperlinks.

    But this tends to assign value to the “Popular” over the “Good”.

    This is a problem which has been discussed at length in metaphysics, since Plato’s time with no real solution yet.

    Interestingly, the current economic crisis can also be viewed as a problem linked to this.

    The problem was the quant formulae were designed to reduce the negative “risk”, rather than track the positive “trust”… I mean Do You Trust Your Bank Manager? If so, why?

  3. […] MIT Technology Review on mining data from social networks. Interesting because I mentioned this a while back […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s