Cyber Summit 2012: A bit of big data and a lot of small tweets

Last week (October 1-3), MPK Analytics attended the annual Cyber Summit in Banff. The theme for this year was; "Leading the Way in the Age of Big Data". As might be expected when you bring together this many enthusiastic and awesome data scientist nerds, the meeting was heavily tweeted with a total of 525 tweets carrying the official hashtag #cybersummit. Since this was a gathering of minds at the leading edge of big data and analytics it seems like a no brainer that the twitter data generated during the span of the meeting ought to be analyzed. So here it is, an overview, breakdown, and analysis of the twitter data that was generated by participants at the Cyber Summit.

summit blog pic1

The time series of the hourly number of tweets shows several interesting patters. The highest rate of tweeting occurred during the Opening Plenary (Is data the next oil?) closely followed by Andrew Hessel's Keynote Presentation. Although the total number of tweets during Andrew Hessels presentation were somewhat lower than during the Opening Plenary he was flying solo while the Opening Plenary had four presenters, i.e. Andrew Hessel obtained the highest per capita number of tweets. His presentation was excellent and is available on Cybera's YouTube channel.

It is a bit surprising that Hilary Mason's keynote did not generate more twitter activity. It was an excellent, polished, and funny talk providing lots of neat insights obtained from big data. Perhaps people were so mesmerized by her presentation that they forgot to tweet?  Another patter discernible is that there seems to be a trend for the "peaks" to get lower the more the meeting progressed with the Closing Plenary receiving only a mere fraction of the tweets the Opening Plenary received. Were people getting tired? Sore thumbs? Participants leaving early?

summit blog pic2

So who tweeted the most during the summit? Looking at the top twitteratis shows that @cybera, the organizer of the summit, posted a total of 123 tweets (23%), far out-tweeting everyone else. Second highest twitterati was yours truly with @MPKAnalytics with 76 tweets (15%) and third place went to @SeawaBob with 47 tweets (9%). All in all, 47% of all tweets were posted by the top three twitteratis and 88% of all tweets by only 20 participants (individuals or organizations).

summit blog pic3

What were the most commonly used hashtags? Since #cybersummit was used to find the tweets associated with the summit in the first place it is obviously going to be found in all 525 tweets, so we are excluding it here. Looking at the top hashtags shows, perhaps unsurprisingly, that #bigdata was the most common (used 60 times), followed by #andrewhessel (17 times) and #whereistcelabnow (6).

Finally, just because we can do it, we can also visualize all the tweets as a word cloud.

summit blog pic4

The analysis of the Cyber Summit 2012 tweet data was done entirely in R using various packages for parsing the scraped XML markup code (XML), mining the tweets (tm), and generating the word cloud (wordcloud). The R code and associated data is freely (free as in free hugs and as in freedom) available at the official GitHub page of MPK Analytics.