Automating Classification of User Sentiment is the Key to Unlocking Social Media

Rush Limbaugh has 33% more buzz online than Jay Leno according to the Vitrue Social Media Index. Social game company Zynga has tweets by posters with 3 times as many followers as their competitor Playfish according to the data over the last 30 days from Radian6. But it doesn’t really mean anything unless you know what percentage of that buzz or those tweets are positive or negative.

Sentiment is the guiding light that helps marketers and their organization know what to do with the mountain of social media that is growing exponentially through Facebook and Twitter. We need to understand our current positive to negative ratios, drill down to uncover what’s driving it, then put programs in place to mitigate the negative and foster growth of the positive. Just like we determine the ROI on capital improvements, with an accurate measurement of positive and negative sentiment we can measure the return on investing in programs to improve consumer sentiment about our brands.

So the vision thing is great, but the reality of going through thousands of posts and tweets and determining sentiment is the biggest impediment facing marketers and their organizations to moving forward. The industry average in being able to automate this process of categorizing the sentiment of user posts is about 60%, based on a conversation with Radian6.

Radian6 is a great workflow tool, but today they only offer users the ability to hand code the sentiment of each post. Their goal is to introduce automated sentiment attribution into the Radian6 dashboard this summer to get to 70 or 75% accuracy.

Crimson Hexagon has shown through research by co-founder and Harvard Professor Gary King that its approach to automatically categorizing the percentage of posts in blogs is higher than hand-coding or strictly counting the number of words. “Crimson Hexagon doesn’t count words, which can mislead; it amplifies human judgment to give the percentage in each category accurately,” noted King in a recent tweet. One of King’s colleagues noted that Crimson Hexagon is close to 80-88% accuracy for positives and negatives using their approach.

Very few marketers and organizations will spend the resources to hand-code responses (in fact King’s research suggests that one shouldn’t and typically see diminishing returns after 500). We will more and more rely on automated tools to do this work for us. Thus when the time comes to pick vendors, agencies and tools to help us measure sentiment, it is critical for us to understand the underlying data, methodology and resulting accuracy rates.

To date Crimson Hexagon’s methodology seems to provide the most promise. What other tools are you using to identify positive and negative sentiment of your brand and what is their underlying methodology?

One thought on “Automating Classification of User Sentiment is the Key to Unlocking Social Media”

  1. Thanks for the informative post.

    I thought you might be interested to know that the technology of Crimson Hexagon can do two other things, in addition to what you list.

    First, even if the theoretically best possible classifier can only get 80% or 70% or even 10% of documents (tweets, blog posts, emails, etc.) put into the correct categories, our methodology will still give the correct percent of positive and negative documents. In most situations, users don’t care about classifying any individual document, and instead are interested only in characterizing all of them; we can do that in a completely unbiased way even if the best classification-based approaches fail. Its a different way of looking at the problem: we say that instead of trying to find the needle in the haystack, the goal is usually to characterize the haystack. By tuning our methodology to the quantity of real interest, we can make a (public, but patent pending) correction to get the right answer.

    Second, Crimson Hexagon technology allows one to use any categories you want, not only positives and negatives, and will still get accurately characterize percent of documents in each and every category. So the categories can be topics, attitudes, sentiment, opinions, geographic areas, views, or anything else.

Leave a Reply

Your email address will not be published. Required fields are marked *