Reddit, the “front page of the internet” – and a network I hardly ever dare enter for fear of being sucked in to reading 100s of comments for hours on highly pointless yet entertaining things – has had its share of controversies over the years.
The site is structurally divided up into “subreddits” , which one can imagine just as simple, quite old-school, forums where anyone can leave links and comments, and anyone else can up or downvote them as to whether they approve or not.
Reddit users were themselves busily engaged in a chat regarding “which popular subreddit has a really toxic community” when Ben Bell of Idibon (a company big into text analysis) decided to tackle the same question with a touch of data science.
But what is “toxic”? Here’s their definition.
Ad hominem attack: a comment that directly attacks another Redditor (e.g. “your mother was a hamster and your father smelt of elderberries”) or otherwise shows contempt/disagrees in a completely non-constructive manner (e.g. “GASP are they trying CENSOR your FREE SPEECH??? I weep for you /s”)
Overt bigotry: the use of bigoted (racist/sexist/homophobic etc.) language, whether targeting any particular individual or more generally, which would make members of the referenced group feel highly uncomfortable
Now, text sentiment analysis isn’t all that perfect as of today. The CTO of Datasift who has a very cool social-media-data-acquiring-tool was claiming around 70% accuracy as being about the peak possible, a couple of years ago. The CEO of the afore-mention Idibon claimed about 80% was possible today.
No-one is claiming nearly 100%, especially on such subtle determinations such as toxicity, and their chosen opposite, supportiveness. The learning process was therefore a mix of pure machine science and human involvement, with the Idibon sentiment analysis software highlighting, via the Reddit API, the subreddits most likely to be extreme, and humans classifying a subset of the posts into those categories.
But what is a toxic community? It’s not as simple as simply a place with a lot of toxic comments (although that’s probably not a bad proxy). It’s a community where such nastiness is approved of or egged on, rather than ignored, frowned upon or punished. Here Reddit provides a simple mechanism to indicate this, as each user can upvote (approve of) or downvote (disapprove of) a post.
Their final formula they used to calculate judge the subreddits, as per their blog again, is
The full results of their analysis are kindly available for interactive visualisation, raw data download and so on here.
But in case anyone is in need of a quick offending, here were the top 5 by algorithmic toxicity. It may not be advisable to visit them on a work computer.
|Rank of bigotry
||Discussion of sexual strategy in a culture increasingly lacking a positive identity for men.
||The Opie and Anthony Show
||The web’s largest atheist forum. All topics related to atheism, agnosticism and secular living are welcome.
||r/sex is for civil discussions about all facets of sexuality and sexual relationships. It is a sex-positive community and a safe space for people of all genders and orientations.
||A subreddit for those who adorn their necks with proud man fur.Neckbeard: A man who is socially inept and physically unappealing, especially one who has an obsessive interest in computing:- Oxford Dictionary
[Edited to correct Ben Bell’s name and column title of table – my apologies!]