Testing improved data retrieval on Cybera’s Rapid Access Cloud

This week, from May 18-20, the annual Mining Software Repositories (MSR) conference is taking place. This annual event brings together global experts in data science, machine learning, and artificial intelligence to explore interesting and actionable ways to improve software development practices. In the lead up to MSR, a hackathon was organized. This year, the focus of the hackathon was on GrimoireLab (GL), a toolset designed to help retrieve, analyze, and visualize data from tools supporting software development.

Kalvin Eng, a PhD student in computing science from the University of Alberta, participated in the hackathon, and used Cybera’s Rapid Access Cloud to complete his project. Here, he outlines the GL focus he and his lab partner, Hareem Sahar, took, and how their work can help software engineering researchers more easily extract and process software development data.

“GrimoireLab helps users mine different data from different sources, and we were interested in seeing how it applies to GitHub and Gitter,” says Eng.

GitHub is a service that hosts repositories of software code, and Gitter is a chat room platform for users and developers of GitHub repositories.

“I’m really interested in how people use and develop software, and Hareem previously wrote a study on understanding the usage of Gitter and its relation to GitHub repository issue reports,” says Eng. “We wanted to see if GL could replicate that study, but in an easier way. 

“GrimoireLab is really good at standardizing how data is retrieved, which is great for researchers looking to replicate others’ work. We wanted to see if we could use the tool to retrieve the same Gitter comments and Github data the other study retrieved.”

The team ran their GL instance on the Rapid Access Cloud, a free cloud resource available to Alberta researchers and classrooms.

“Using GL, we were able to retrieve the same Gitter and GIthub data used by the previous study, but in a nice, standardized format, which we were able to run our analysis scripts on,” says Eng. “Previously, to retrieve that kind of data, we had to write our own scripts, and figure out how to run them. This tool takes your configurations and retrieves data by itself. It’s much faster and easier.”

This opens up new opportunities for researchers looking to retrieve different datasets for their software engineering research, in an easy, standardized format.

Hands-on support with Rapid Access Cloud

Eng had been using the Rapid Access Cloud for several years now, and it was a natural choice for him to turn to Cybera’s cloud for this project. “GrimoireLab is relatively resource heavy, and you need fast Internet speeds to effectively run it, which is where I appreciate the direct connection of the Rapid Access Cloud to the R&E Network backbone.”

“I also appreciate how accommodating and easy to reach the Cybera team is. And it’s free!”

For more details on Eng and Saheer’s hackathon project, visit the MSR conference website.