logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

Analyzing-HN.md (15980B)


  1. ---
  2. date: 2017-09-13
  3. layout: post
  4. title: "Analyzing HN moderation & censorship"
  5. tags: [hacker news]
  6. ---
  7. [Hacker News](https://news.ycombinator.com) is a popular
  8. "[hacker](http://www.catb.org/jargon/html/H/hacker.html)" news board. One thing
  9. I love about HN is that the moderation generally does an excellent job. The site
  10. is free of spam and the conversations are usually respectful and meaningful (if
  11. pessimistic at times). However, there is always room for improvement, and
  12. moderation on Hacker News is no exception.
  13. **Notice**: on 2017-10-19 this article was updated to incorporate feedback the
  14. Hacker News moderators sent to me to clarify some of the points herein. You may
  15. view a diff of these changes
  16. [here](https://github.com/SirCmpwn/sircmpwn.github.io/commit/553d051c84a4631c3bd3264a437dfbc6c9807d13).
  17. For some time now, I've been scraping the HN API and website to learn how the
  18. moderators work, and to gather some interesting statistics about posts there
  19. in general. Every 5 minutes, I take a sample of the front page, and every 30
  20. minutes, I sample the top 500 posts (note that HN may return fewer than this
  21. number). During each sample, I record the ID, author, title, URL, status
  22. (dead/flagged/dupe/alive), score, number of comments, rank, and compute the rank
  23. based on [HN's published algorithm](https://news.ycombinator.com/item?id=231209).
  24. A note is made when the title, URL, or status changes.
  25. [![](https://sr.ht/IFCA.png)](https://hn.0x2237.club/post/15217697)
  26. The information gathered is publicly available at
  27. [hn.0x2237.club](https://hn.0x2237.club) (sorry about the stupid domain, I just
  28. picked one at random). You can search for most posts here going back to
  29. 2017-04-14, as well as view recent
  30. [title](https://hn.0x2237.club/title-changes) and
  31. [url](https://hn.0x2237.club/url-changes) changes or [deleted
  32. posts](https://hn.0x2237.club/deleted)
  33. ([score>10](https://hn.0x2237.club/deleted-10)). Raw data is available as JSON
  34. for any post at `https://hn.0x2237.club/post/:id/json`. Feel free to explore the
  35. site later, or [its shitty code](https://git.sr.ht/~sircmpwn/hnstats). For now,
  36. let's dive into what I've learned from this data.
  37. ### Tools HN mods use
  38. The main tools I'm aware of that HN moderators can use to perform their duties
  39. are:
  40. - Editing link titles or URLs
  41. - Influencing story rank via "downweighting" or "burying"
  42. - Deleting or "killing" posts
  43. - Detaching off-topic or rulebreaking comment threads from their parents
  44. - <abbr title="Banning them without making it known to them">Shadowbanning</abbr>
  45. misbehaving users
  46. - Banning misbehaving users (and telling them)
  47. The moderators emphasize a difference between deleting a post and killing a
  48. post. The former, deleting a post, will remove it from all public view like it
  49. had never existed, and is a tool used infrequently. Killing a post will mark it
  50. as [dead] so it doesn't show up on the post listing.
  51. Influencing a post's rank can also be done through several means of varying
  52. severity. "Burying" a post will leave a post alive, but plunge it in rank.
  53. "Downweighting" is similar, but does not push its rank as far.
  54. There are also automated tools for detecting spam and <abbr title="Posts
  55. influenced by a group of early voters hoping to get it on the front page">voting
  56. rings</abbr>, as well as automated de-emphasizing of posts based on certain
  57. <abbr title="'Bitcoin' was known to at some point be one of these">secret
  58. keywords</abbr> and controls to prevent flamewars. Automated tools on Hacker
  59. News are used to downweight or kill posts, but never to bury or delete them.
  60. Dan spoke about these tools and their usage for me:
  61. >Of these four interventions (deleting, killing, burying, and downweighting),
  62. >the only one that moderators do frequently is downweighting. We downweight
  63. >posts in response to things that go against the site guidelines, such as when a
  64. >submission is unsubstantive, baity or sensational. Typically such posts remain
  65. >on the front page, just at a lower rank. We bury posts when they're dupes,
  66. >but rarely otherwise. We kill posts when they're spam, but rarely
  67. >otherwise. [...] We never delete a post unless the author asks us to.
  68. Dan also further clarified the difference between dead and deleted for me:
  69. >The distinction between 'dead' and 'deleted' is important. Dead posts
  70. >are different from deleted ones in that people can still see them if
  71. >they set 'showdead' to 'yes' in their profile. That way, users who
  72. >want a less moderated view can still see everything that has been
  73. >killed by moderators or software or user flags. Deleted posts, on the
  74. >other hand, are erased from the record and never seen again. On HN,
  75. >authors can delete their own posts for a couple hours (unless they are
  76. >comments that have replies). After that, if they want a post deleted
  77. >they can ask us and we usually are happy to oblige.
  78. Moderators can also artificially influence rank upwards - one way is by inviting
  79. the user to re-submit a post that they want to give another shot at the front
  80. page. This gives the post a healthy upvote to begin with and prevents it from
  81. being flagged. The moderators invited me to re-submit this very article using
  82. this mechanism on 2017-10-19.
  83. Banning users is another mechanism that they can use. There are two ways bans
  84. are typically applied around the net - telling users they've been banned, and
  85. keeping it quiet. The latter - shadowbanning - is a useful tool against spammers
  86. and serial ban evaders who might otherwise try to circumvent their ban. However,
  87. it's important that this does *not* become the first line of defense against
  88. rulebreaking users, who should instead be informed of the reason for their ban
  89. so they have a chance to reform and appeal it. Here's what Dan has to say about
  90. it:
  91. >Shadowbanning has proven to still be useful for spammers and trolls
  92. >(i.e. when a new account shows up and is clearly breaking the site
  93. >guidelines off the bat). Most such abuse is by a relatively small
  94. >number of users who create accounts over and over again to do the same
  95. >things. When there's evidence that we've repeatedly banned someone
  96. >before, I don't feel obliged to tell them we're banning them again.
  97. >[...] When we're banning an established account, though, we post a comment
  98. >saying so, and nearly always only after warning that user beforehand. Many such
  99. >users had no idea they were breaking the site guidelines and are
  100. >quite happy to improve their posts, which is a win for everyone.
  101. Dan also shared a link to search for comments where moderators have explained
  102. to users why they've been banned. Of course, this doesn't include users who were
  103. banned without explanation, or that use slightly different language:
  104. [dang's bans](https://hn.algolia.com/?query=by:dang%20we%20banned&sort=byDate&dateRange=all&type=comment&storyText=false&prefix&page=0)
  105. [sctb's bans](https://hn.algolia.com/?query=by:sctb%20we%20banned&sort=byDate&dateRange=all&type=comment&storyText=false&prefix=false&page=0)
  106. ## Data-based insights
  107. Here's an example of a fairly common moderator action:
  108. ![](https://sr.ht/PhJM.png)
  109. [This post](https://hn.0x2237.club/post/15217697) had its title changed at
  110. around 09-11-17 12:10 UTC, and had the rank artificially adjusted to push it
  111. further down the front page. We can tell that the drop was artificial just by
  112. correlating it with the known moderator action, but we can also compare it
  113. against the computed base rank:
  114. ![](https://sr.ht/IJQI.png)
  115. Note however that the base rank is often wildly different from the rank observed
  116. in practice; the factors that go into adjusting it are rather complex. We can
  117. also see that despite the action, the post's score continued to increase, even
  118. at an accelerated pace:
  119. ![](https://sr.ht/FmNU.png)
  120. This "title change and derank" is a fairly common action - here are some more
  121. examples from the past few days:
  122. [Betting on the Web - Why I Build PWAs](https://hn.0x2237.club/post/15219154)
  123. [Silicon Valley is erasing individuality](https://hn.0x2237.club/post/15210767)
  124. [Chinese government is working on a timetable to end sales of fossil-fuel cars](https://hn.0x2237.club/post/15208565)
  125. Users can change their own post titles, which I'm unable to distinguish from
  126. moderator changes. However, correlating them with a strange change in rank is
  127. generally a good bet. Submitters also generally will edit their titles earlier
  128. rather than later, so a later change may indicate that it was seen by a
  129. moderator after it rose some distance up the page.
  130. I also occasionally find what seems to be the opposite - artificially bumping a
  131. post further up the page. Here's two examples:
  132. [15213371](https://hn.0x2237.club/post/15213371) and
  133. [15209377](https://hn.0x2237.club/post/15209377). Rank influencing in either
  134. direction also happens without an associated title or URL change, but
  135. automatically pinning such events down is a bit more subtle than my tools can
  136. currently handle.
  137. Moderators can also delete a post or indicate it as a dupe. The latter can be
  138. (and is) detected by my tools, but the former is indistinguishable from the user
  139. opting to delete posts themselves. In theory, posts that are deleted *after* the
  140. author is no longer allowed to could be detected, but this happens rarely and my
  141. tools don't track posts once they get old enough.
  142. ### Flagging
  143. The users have some moderation tools at their disposal, too - downvotes,
  144. flagging, and vouching. When a comment is downvoted, it is moved towards the
  145. bottom of the thread and is gradually colored grayer to become less visible, and
  146. can be reversed with upvotes. When a comment gets enough flags, it is removed
  147. entirely unless you have showdead enabled in your profile. Flagged posts are
  148. downweighted or killed when enough flags accumulate. These posts are moved to
  149. the bottom of the ranked posts even if you have showdead enabled, and can also
  150. be seen in /new. Flagging can be reversed with the vouch feature, but flagged
  151. stories are almost never vouched back into existence.
  152. **Note**: detection of post flagged status is very buggy with my tools. The API
  153. exposes a boolean for dead posts, so I have to fall back on scraping to
  154. distinguish between different kinds of dead-ness. But this is pretty buggy, so I
  155. encourage you to examine the post yourself when browsing my site if in doubt.
  156. ### Are these tools abused for censorship?
  157. Well, with all of this data, was I able to find evidence of censorship? There
  158. are two answers: yes and maybe. The "yes" is because users are *definitely*
  159. abusing the flagging feature. The "maybe" is because moderator action leaves
  160. room for interpretation. I'll get to that later, but let's start with flagging
  161. abuse.
  162. #### Censorship by users
  163. The threshold for removing a story due to flags is rather low, though I don't
  164. know the exact number. Here are some posts whose flags I consider questionable:
  165. [Harvey, the Storm That Humans Helped Cause](https://hn.0x2237.club/post/15129859) (23 points)
  166. [ES6 imports syntax considered harmful](https://hn.0x2237.club/post/15116132) (12 points)
  167. [White-Owned Restaurants Shamed for Serving Ethnic Food](https://hn.0x2237.club/post/14415411) (33 points)
  168. [The evidence is piling up – Silicon Valley is being destroyed](https://hn.0x2237.club/post/14152602) (27 points)
  169. A good place to discover these sorts of events is to browse hnstats for posts
  170. deleted with a score [>10 points](https://hn.0x2237.club/deleted-10). There are
  171. also occasions where the flags seem to be due to a poor title, which is a
  172. fixable problem for which flagging is a harsh solution:
  173. [Poettering downvoted 5 (at time of this writing) times](https://hn.0x2237.club/post/14679207)
  174. [Germany passes law restricting free speech on the internet](https://hn.0x2237.club/post/14676296)
  175. The main issue with flags is that they're often used as an alternative to the
  176. HN's (by design) lack of a downvoting feature. HN also gives users no guidelines
  177. on *why* they should flag posts, which mixes poorly with automated removal of a
  178. post given enough flags.
  179. #### Censorship by moderators
  180. Moderator actions are a bit more difficult to judge. Moderation on HN is a black
  181. box - most of the time, moderators don't make the reasoning behind their actions
  182. clear. Many of their actions (such as rank influence) are also subtle and easy
  183. to miss. Thankfully they are often receptive to being asked why some moderation
  184. occurred, but only as often as not.
  185. Anecdotally, I also find that moderators occasionally moderate selectively, and
  186. keep quiet in the face of users asking them why. Notably this is a problem for
  187. <abbr title="links for which you have to pay money to read the
  188. content">paywalled</abbr> articles, which are [against the
  189. rules](https://news.ycombinator.com/newsfaq.html) but are often allowed to
  190. remain.
  191. Dan sent me a response to this section:
  192. >[It's true that we don't explain our actions], but mostly because it would be
  193. >hopeless to try. We could do that all day and still not make everything clear,
  194. >because the quantity is overwhelming and the cost of a high-quality explanation
  195. >is steep. Moreover the experiment would be impossible to run because one
  196. >would die of boredom long before reaching 100%. Our solution to this
  197. >conundrum is not to try to explain everything but to answer specific
  198. >questions as best we can. We don't answer every question, but that's
  199. >mostly because we don't see every question. If people ask us things on
  200. >HN itself, odds are we won't see it (also, the site guidelines ask
  201. >users not to do this, per ([our
  202. >guidelines](https://news.ycombinator.com/newsguidelines.html)). If they
  203. >[email us](mailto:hn@ycombinator.com), the probability of a
  204. >response approaches 1.
  205. I can attest personally to success reaching out to hn@ycombinator.com for
  206. clarification and even reversal of some moderator decisions, though at a
  207. response ratio further from 1 than this implies. That being said, I don't think
  208. that private discourse between the submitter and the moderators is the only
  209. solution. Other people may be invested in the topic, too - users who upvoted the
  210. story might not notice its disappearance, but would like more attention drawn to
  211. the topic and enjoy more discussion. Commenters are even more invested in the
  212. posts. The submitter is not the only one whoses interests are at stake. This is
  213. even more of a problem for posts which are moderated via user flags - the HN
  214. mods are pretty discretionate but users are much less so.
  215. Explaining every action is not necessary - I don't think anyone needs you to
  216. explain why someone was banned when they were submitting links to earn money at
  217. home in your spare time. However, I think a public audit log of moderator
  218. actions would go a long way, and could be done by software - avoiding the need
  219. to explain everything. I envision a change to your UI for banning users or
  220. moderating posts with that adds a dropdown of common reasons and a textbox for
  221. further elaboration when appropriate - then makes this information appear on
  222. /moderation.
  223. ### Conclusions
  224. I should again emphasize that most moderator actions are benign and agreeable.
  225. They do a great job on the whole, but striving to do even better would be
  226. admirable. I suggest a few changes:
  227. - Make a public audit log of moderation activity, or at least reach out to me to
  228. see what small changes could be done to help improve my statistics gathering.
  229. - Minimize use of more subtle actions like rank influence, and when used,
  230. - More frequently leave comments on posts where moderation has occurred
  231. explaining the rationale and opening an avenue for public discussion and/or
  232. appeal.
  233. - Put flagged posts into a queue for moderator review and don't remove posts
  234. simply because they're flagged.
  235. - Consider appointing one or two moderators from the community, ideally people
  236. with less bias towards SV or startup culture.
  237. Hacker News is a great place for just that - hacker news. It has been for a long
  238. time and I hope it continues to be. Let's work together on running it
  239. transparently to the benefit of all.