Computer science prof and undergrad duo explore social phenomena on Reddit and GitHub

Isaac Waller and Ashton Anderson
U of T undergrad Isaac Waller (left) with Ashton Anderson, an assistant professor in UTSC's department of computer and mathematical sciences. (Photo by Diana Tyszko)

Jovana Jankovic

When we log on to any social media account, we are immediately immersed in our chosen groups, forums, events and conversations. But how does the nature of our engagement with online communities determine how we’ll behave on those platforms in the future, what kinds of interactions we’ll be part of and what types of information we’ll end up taking away?

These are the questions that U of T Scarborough assistant professor of computer science Ashton Anderson and undergrad student Isaac Waller are trying to answer.

Waller is a fourth-year Woodsworth College student in computer science and sociology, two disciplines he didn’t initially think were compatible. But then he came across Anderson’s work in computational social science, an area of research in which computers are used to simulate, model and analyze social phenomena such as human behavior and communication.

“I sent him an email completely out of the blue,” says Waller, “and we began to meet on campus. Eventually he recommended that we work together on a research project.”

Waller and Anderson decided to work together on a faculty-student computer science project—an opportunity for undergraduates to work one-on-one with a faculty member on a project of mutual interest for a course credit.

“Like most people these days, I’ve been a part of many online communities, some of which feel as real as in-person communities,” says Waller. “Studying how they relate to each other and how they influence the people within them is fascinating.”

“Activity diversity” and user behavior: who engages where, how and for how long?

In their project, Waller and Anderson took data from the top 10,000 subreddits on Reddit—which account for 96.8 per cent of all comments—and the top 40,000 GitHub communities by number of stars, which are used to mark interesting and relevant content.

They then applied a machine learning model to this data which created embeddings—“sort of like maps,” says Waller—that show the relationships between various online communities.

“Our technique is inspired by the way computational linguists measure how similar words are by looking at which words appear in similar contexts,” says Waller. “In the same way that words with similar usage patterns tend to be similar, communities with similar members tend to be related.”

“After exploring different directions in the study of online communities, we eventually got most excited about looking at generalists and specialists,” says Anderson.

Generalists are users who apply themselves broadly to many topics of interest to them—that is, they probably know a little bit about a lot of different things. Specialists, on the other hand, allocate their energy to a narrow area of focus—they are likely to know a lot about very few specific things.

After identifying generalists and specialists, Anderson and Waller began to explore relationships to how certain posts are “liked,” how long users spend on certain platforms or within certain communities, the diversity of sub-populations with which users interact, the predictability of user behaviour and the kinds of communities that either generalists or specialists tend to engage with.

Their conclusions?

“Specialists are more likely to produce higher-quality replies” and “stay in communities they contribute to,” write Waller and Anderson in a forthcoming paper on their research, while “generalists are much more likely to remain on the platform as a whole.”

“We found that generalists engage with a significantly more diverse group of people, whereas specialists are exposed to much narrower segments of the population.”

Sharing the research: implications, partnerships and paths for further study

Waller and Anderson will be presenting their novel methodology in their paper, “Generalists and Specialists: Using Community Embeddings to Quantify Activity Diversity in Online Platforms” at The Web Conference, a prestigious annual international conference about all things relating to the internet, attended not only by top-tier academic researchers, but also by representatives from tech giants like Google, Netflix and Amazon.

What kind of feedback do Waller and Anderson anticipate?

“Our work is so new that people haven't had time to follow up on it yet,” says Anderson, “But our methodology and analysis framework can definitely be used to study questions like the extent to which users are living in political echo chambers online. This is the kind of thing we hope to get people excited about. We have open-sourced some tools that Isaac developed to help people follow up on our work.”

Waller will singlehandedly be presenting the paper at the conference, which takes place in San Francisco in late spring. He’s excited to participate in such a high-level event—a rare experience for an undergraduate student.

“I’ve never given a presentation before to this large of an audience,” he says. “I hope that our work will be of interest to some people and that I might contribute to the current dialogue in this field,” says Waller. “Although it has become a popular topic in the news, research into the ways social media influences how we discover new ideas is still in its early stages.”