Writing the Book on How to Use ‘Big Data’

August 8, 2013 Matt Shipman 3-min. read

Nagiza Samatova, Arpan Chakraborty and Kanchana Padmanabahn. (Image: Marc Hall)

NC State students wrote the book on analyzing “big data” – sifting useful information out of the sea of business, personal and other data available online and elsewhere. Or at least they’ve written a book about mining that big data.

Nagiza Samatova, a professor of computer science at NC State, and four Ph.D. students are co-editors of a book released this month that is a how-to guide for anyone interested in learning how to analyze big data. More than fifty other undergraduates and graduate students contributed as co-authors.

The book focuses on “graphs,” a computer science term for a model that shows the connections between entities, whether those entities are people or power stations. These graphs can be visual if the networks they represent are small enough. But for large data sets, such as all the people who shop at Amazon.com, it is easier to create digital, machine-readable graphs.

These digital graphs can be “mined” to identify patterns for various applications. For example, online retailers could mine a graph to suggest products a user may be interested in based on previous purchases, or social media platforms may suggest contacts and job opportunities based on a user’s demonstrated preferences.

“This approach to analyzing large data sets – so-called ‘big data’ – is an important field in computer science, with applications in areas from climate modeling to data security to the business community,” says Kanchana Padmanabahn, an NC State Ph.D. student and co-editor of the book. “We wanted to see an introductory book that walks people through graph mining, so we decided to create it ourselves.”

Samatova first presented the idea to students in her data-mining course. When the students expressed interest in the idea, Samatova encouraged them to help her develop the book.

“Our goal was, in part, to create a book that could be used outside the classroom,” says Arpan Chakraborty, an NC State Ph.D. student and co-editor of the book. “But we also wanted to come up with something that could be used by instructors in data-mining courses.”

Samatova’s first step was to ask the students in her course what they would want to see in the book, with the idea of using their suggestions to organize the content.

“We found that students wanted to be sure the book made no assumptions about how much readers already knew,” Padmanabahn says. The students – all future co-authors or co-editors – chose to focus on practical steps with real-world examples and applications, so that readers would understand how the various elements of graph mining can be used.

The students were then split into groups, with each group focusing on a specific aspect of graph mining. Ultimately, those groups each contributed a chapter to the final book. The book, “Practical Graph Mining with R,” was published by CRC Press July 23 as part of its series on data mining and knowledge discovery. The other co-editors are Ph.D. student John Jenkins and former Ph.D. student William Hendrix, who is now on faculty at Northwestern University. The work was done with support from the United States Department of Energy’s Scientific Data Analysis and Visualization (SDAV) Institute and the National Science Foundation’s Expedition in Computing on Understanding Climate Change.

The book should benefit computer science students everywhere. But it will certainly benefit the students who helped create it. With the growing importance of big data analysis, finding a job will be a lot easier for those who wrote the book about it.