Analyzing-HN.md - drewdevault.com - [mirror] blog and personal website of Drew DeVault

Analyzing-HN.md (15980B)

---
date: 2017-09-13
layout: post
title: "Analyzing HN moderation & censorship"
tags: [hacker news]
---
[Hacker News](https://news.ycombinator.com) is a popular
"[hacker](http://www.catb.org/jargon/html/H/hacker.html)" news board. One thing
I love about HN is that the moderation generally does an excellent job. The site
is free of spam and the conversations are usually respectful and meaningful (if
pessimistic at times). However, there is always room for improvement, and
moderation on Hacker News is no exception.
**Notice**: on 2017-10-19 this article was updated to incorporate feedback the
Hacker News moderators sent to me to clarify some of the points herein. You may
view a diff of these changes
[here](https://github.com/SirCmpwn/sircmpwn.github.io/commit/553d051c84a4631c3bd3264a437dfbc6c9807d13).
For some time now, I've been scraping the HN API and website to learn how the
moderators work, and to gather some interesting statistics about posts there
in general. Every 5 minutes, I take a sample of the front page, and every 30
minutes, I sample the top 500 posts (note that HN may return fewer than this
number). During each sample, I record the ID, author, title, URL, status
(dead/flagged/dupe/alive), score, number of comments, rank, and compute the rank
based on [HN's published algorithm](https://news.ycombinator.com/item?id=231209).
A note is made when the title, URL, or status changes.
[![](https://sr.ht/IFCA.png)](https://hn.0x2237.club/post/15217697)
The information gathered is publicly available at
[hn.0x2237.club](https://hn.0x2237.club) (sorry about the stupid domain, I just
picked one at random). You can search for most posts here going back to
2017-04-14, as well as view recent
[title](https://hn.0x2237.club/title-changes) and
[url](https://hn.0x2237.club/url-changes) changes or [deleted
posts](https://hn.0x2237.club/deleted)
([score>10](https://hn.0x2237.club/deleted-10)). Raw data is available as JSON
for any post at `https://hn.0x2237.club/post/:id/json`. Feel free to explore the
site later, or [its shitty code](https://git.sr.ht/~sircmpwn/hnstats). For now,
let's dive into what I've learned from this data.
### Tools HN mods use
The main tools I'm aware of that HN moderators can use to perform their duties
are:
- Editing link titles or URLs
- Influencing story rank via "downweighting" or "burying"
- Deleting or "killing" posts
- Detaching off-topic or rulebreaking comment threads from their parents
- <abbr title="Banning them without making it known to them">Shadowbanning</abbr>
  misbehaving users
- Banning misbehaving users (and telling them)
The moderators emphasize a difference between deleting a post and killing a
post. The former, deleting a post, will remove it from all public view like it
had never existed, and is a tool used infrequently. Killing a post will mark it
as [dead] so it doesn't show up on the post listing.
Influencing a post's rank can also be done through several means of varying
severity. "Burying" a post will leave a post alive, but plunge it in rank.
"Downweighting" is similar, but does not push its rank as far.
There are also automated tools for detecting spam and <abbr title="Posts
influenced by a group of early voters hoping to get it on the front page">voting
rings</abbr>, as well as automated de-emphasizing of posts based on certain
<abbr title="'Bitcoin' was known to at some point be one of these">secret
keywords</abbr> and controls to prevent flamewars. Automated tools on Hacker
News are used to downweight or kill posts, but never to bury or delete them.
Dan spoke about these tools and their usage for me:
>Of these four interventions (deleting, killing, burying, and downweighting),
>the only one that moderators do frequently is downweighting. We downweight
>posts in response to things that go against the site guidelines, such as when a
>submission is unsubstantive, baity or sensational. Typically such posts remain
>on the front page, just at a lower rank. We bury posts when they're dupes,
>but rarely otherwise. We kill posts when they're spam, but rarely
>otherwise. [...] We never delete a post unless the author asks us to.
Dan also further clarified the difference between dead and deleted for me:
>The distinction between 'dead' and 'deleted' is important. Dead posts
>are different from deleted ones in that people can still see them if
>they set 'showdead' to 'yes' in their profile. That way, users who
>want a less moderated view can still see everything that has been
>killed by moderators or software or user flags. Deleted posts, on the
>other hand, are erased from the record and never seen again. On HN,
>authors can delete their own posts for a couple hours (unless they are
>comments that have replies). After that, if they want a post deleted
>they can ask us and we usually are happy to oblige.
Moderators can also artificially influence rank upwards - one way is by inviting
the user to re-submit a post that they want to give another shot at the front
page. This gives the post a healthy upvote to begin with and prevents it from
being flagged. The moderators invited me to re-submit this very article using
this mechanism on 2017-10-19.
Banning users is another mechanism that they can use. There are two ways bans
are typically applied around the net - telling users they've been banned, and
keeping it quiet. The latter - shadowbanning - is a useful tool against spammers
and serial ban evaders who might otherwise try to circumvent their ban. However,
it's important that this does *not* become the first line of defense against
rulebreaking users, who should instead be informed of the reason for their ban
so they have a chance to reform and appeal it. Here's what Dan has to say about
it:
>Shadowbanning has proven to still be useful for spammers and trolls
>(i.e. when a new account shows up and is clearly breaking the site
>guidelines off the bat). Most such abuse is by a relatively small
>number of users who create accounts over and over again to do the same
>things. When there's evidence that we've repeatedly banned someone
>before, I don't feel obliged to tell them we're banning them again.
>[...] When we're banning an established account, though, we post a comment
>saying so, and nearly always only after warning that user beforehand. Many such
>users had no idea they were breaking the site guidelines and are
>quite happy to improve their posts, which is a win for everyone.
Dan also shared a link to search for comments where moderators have explained
to users why they've been banned. Of course, this doesn't include users who were
banned without explanation, or that use slightly different language:
[dang's bans](https://hn.algolia.com/?query=by:dang%20we%20banned&sort=byDate&dateRange=all&type=comment&storyText=false&prefix&page=0)
[sctb's bans](https://hn.algolia.com/?query=by:sctb%20we%20banned&sort=byDate&dateRange=all&type=comment&storyText=false&prefix=false&page=0)
## Data-based insights
Here's an example of a fairly common moderator action:
![](https://sr.ht/PhJM.png)
[This post](https://hn.0x2237.club/post/15217697) had its title changed at
around 09-11-17 12:10 UTC, and had the rank artificially adjusted to push it
further down the front page. We can tell that the drop was artificial just by
correlating it with the known moderator action, but we can also compare it
against the computed base rank:
![](https://sr.ht/IJQI.png)
Note however that the base rank is often wildly different from the rank observed
in practice; the factors that go into adjusting it are rather complex. We can
also see that despite the action, the post's score continued to increase, even
at an accelerated pace:
![](https://sr.ht/FmNU.png)
This "title change and derank" is a fairly common action - here are some more
examples from the past few days:
[Betting on the Web - Why I Build PWAs](https://hn.0x2237.club/post/15219154)
[Silicon Valley is erasing individuality](https://hn.0x2237.club/post/15210767)
[Chinese government is working on a timetable to end sales of fossil-fuel cars](https://hn.0x2237.club/post/15208565)
Users can change their own post titles, which I'm unable to distinguish from
moderator changes. However, correlating them with a strange change in rank is
generally a good bet. Submitters also generally will edit their titles earlier
rather than later, so a later change may indicate that it was seen by a
moderator after it rose some distance up the page.
I also occasionally find what seems to be the opposite - artificially bumping a
post further up the page. Here's two examples:
[15213371](https://hn.0x2237.club/post/15213371) and
[15209377](https://hn.0x2237.club/post/15209377). Rank influencing in either
direction also happens without an associated title or URL change, but
automatically pinning such events down is a bit more subtle than my tools can
currently handle.
Moderators can also delete a post or indicate it as a dupe. The latter can be
(and is) detected by my tools, but the former is indistinguishable from the user
opting to delete posts themselves. In theory, posts that are deleted *after* the
author is no longer allowed to could be detected, but this happens rarely and my
tools don't track posts once they get old enough.
### Flagging
The users have some moderation tools at their disposal, too - downvotes,
flagging, and vouching. When a comment is downvoted, it is moved towards the
bottom of the thread and is gradually colored grayer to become less visible, and
can be reversed with upvotes. When a comment gets enough flags, it is removed
entirely unless you have showdead enabled in your profile. Flagged posts are
downweighted or killed when enough flags accumulate. These posts are moved to
the bottom of the ranked posts even if you have showdead enabled, and can also
be seen in /new. Flagging can be reversed with the vouch feature, but flagged
stories are almost never vouched back into existence.
**Note**: detection of post flagged status is very buggy with my tools. The API
exposes a boolean for dead posts, so I have to fall back on scraping to
distinguish between different kinds of dead-ness. But this is pretty buggy, so I
encourage you to examine the post yourself when browsing my site if in doubt.
### Are these tools abused for censorship?
Well, with all of this data, was I able to find evidence of censorship? There
are two answers: yes and maybe. The "yes" is because users are *definitely*
abusing the flagging feature. The "maybe" is because moderator action leaves
room for interpretation. I'll get to that later, but let's start with flagging
abuse.
#### Censorship by users
The threshold for removing a story due to flags is rather low, though I don't
know the exact number. Here are some posts whose flags I consider questionable:
[Harvey, the Storm That Humans Helped Cause](https://hn.0x2237.club/post/15129859) (23 points)
[ES6 imports syntax considered harmful](https://hn.0x2237.club/post/15116132) (12 points)
[White-Owned Restaurants Shamed for Serving Ethnic Food](https://hn.0x2237.club/post/14415411) (33 points)
[The evidence is piling up – Silicon Valley is being destroyed](https://hn.0x2237.club/post/14152602) (27 points)
A good place to discover these sorts of events is to browse hnstats for posts
deleted with a score [>10 points](https://hn.0x2237.club/deleted-10). There are
also occasions where the flags seem to be due to a poor title, which is a
fixable problem for which flagging is a harsh solution:
[Poettering downvoted 5 (at time of this writing) times](https://hn.0x2237.club/post/14679207)
[Germany passes law restricting free speech on the internet](https://hn.0x2237.club/post/14676296)
The main issue with flags is that they're often used as an alternative to the
HN's (by design) lack of a downvoting feature. HN also gives users no guidelines
on *why* they should flag posts, which mixes poorly with automated removal of a
post given enough flags.
#### Censorship by moderators
Moderator actions are a bit more difficult to judge. Moderation on HN is a black
box - most of the time, moderators don't make the reasoning behind their actions
clear. Many of their actions (such as rank influence) are also subtle and easy
to miss. Thankfully they are often receptive to being asked why some moderation
occurred, but only as often as not.
Anecdotally, I also find that moderators occasionally moderate selectively, and
keep quiet in the face of users asking them why. Notably this is a problem for
<abbr title="links for which you have to pay money to read the
content">paywalled</abbr> articles, which are [against the
rules](https://news.ycombinator.com/newsfaq.html) but are often allowed to
remain.
Dan sent me a response to this section:
>[It's true that we don't explain our actions], but mostly because it would be
>hopeless to try. We could do that all day and still not make everything clear,
>because the quantity is overwhelming and the cost of a high-quality explanation
>is steep. Moreover the experiment would be impossible to run because one
>would die of boredom long before reaching 100%. Our solution to this
>conundrum is not to try to explain everything but to answer specific
>questions as best we can. We don't answer every question, but that's
>mostly because we don't see every question. If people ask us things on
>HN itself, odds are we won't see it (also, the site guidelines ask
>users not to do this, per ([our
>guidelines](https://news.ycombinator.com/newsguidelines.html)). If they
>[email us](mailto:hn@ycombinator.com), the probability of a
>response approaches 1.
I can attest personally to success reaching out to hn@ycombinator.com for
clarification and even reversal of some moderator decisions, though at a
response ratio further from 1 than this implies. That being said, I don't think
that private discourse between the submitter and the moderators is the only
solution. Other people may be invested in the topic, too - users who upvoted the
story might not notice its disappearance, but would like more attention drawn to
the topic and enjoy more discussion. Commenters are even more invested in the
posts. The submitter is not the only one whoses interests are at stake. This is
even more of a problem for posts which are moderated via user flags - the HN
mods are pretty discretionate but users are much less so.
Explaining every action is not necessary - I don't think anyone needs you to
explain why someone was banned when they were submitting links to earn money at
home in your spare time. However, I think a public audit log of moderator
actions would go a long way, and could be done by software - avoiding the need
to explain everything. I envision a change to your UI for banning users or
moderating posts with that adds a dropdown of common reasons and a textbox for
further elaboration when appropriate - then makes this information appear on
/moderation.
### Conclusions
I should again emphasize that most moderator actions are benign and agreeable.
They do a great job on the whole, but striving to do even better would be
admirable. I suggest a few changes:
- Make a public audit log of moderation activity, or at least reach out to me to
  see what small changes could be done to help improve my statistics gathering.
- Minimize use of more subtle actions like rank influence, and when used,
- More frequently leave comments on posts where moderation has occurred
  explaining the rationale and opening an avenue for public discussion and/or
  appeal.
- Put flagged posts into a queue for moderator review and don't remove posts
  simply because they're flagged.
- Consider appointing one or two moderators from the community, ideally people
  with less bias towards SV or startup culture.
Hacker News is a great place for just that - hacker news. It has been for a long
time and I hope it continues to be. Let's work together on running it
transparently to the benefit of all.