logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

RaptorCS-Blackbird-a-horror-story.md (7072B)


  1. ---
  2. date: 2019-09-23
  3. layout: post
  4. title: "RaptorCS POWER9 Blackbird PC review"
  5. ---
  6. **November 2018**: Ordered [Basic Blackbird
  7. Bundle](https://www.raptorcs.com/content/BK1B01/intro.html) w/32 GB RAM:
  8. $1,935.64
  9. **Update 2019-12-23**: This article was originally titled "RaptorCS POWER9
  10. Blackbird PC: An expensive mistake". Please read the follow-up article,
  11. published 2019-10-10:
  12. [RaptorCS's redemption: the POWER9 machine works][followup]
  13. [followup]: https://drewdevault.com/2019/10/10/RaptorCS-redemption.html
  14. **June 2019**
  15. Order ships, and arrives without RAM. It had been long enough that I didn't
  16. realize the order had only been partially fulfilled, so I order some RAM from
  17. the [list of recommended chips][RAM] ($338.40), along with the other necessities
  18. that I didn't purchase from Raptor: a case ($97.99) and a PSU ($68.49), and grab
  19. some hard drives I have lying around. Total cost: about $2,440. Worth it to get
  20. POWER9 builds working on builds.sr.ht!
  21. [RAM]: https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/Memory
  22. I carefully put everything together, consulting the manual at each step, plug in
  23. a display, and turn it on. Lights come on, things start whizzing, and the screen
  24. comes to life - and promptly starts boot looping.
  25. **June 27th**
  26. Support ticket created. What's going on with my board?
  27. **June 28th**
  28. Support gets back to me the next day with a suggestion which is unrelated to the
  29. problem, but no matter - I spoke with volunteers in the IRC channel a few hours
  30. earlier and we found out that - whoops! - I hadn't connected the CPU power to
  31. the motherboard. This is the end of the PEBKAC errors, but not the end of the
  32. problems. The machine gets further ahead in the boot - almost to "petitboot",
  33. and then the display dies and the machine reveals no further secrets.
  34. I sent an update to the support team.
  35. **July 1st**
  36. > We have normally only seen this type of failure when there is a RAM-related
  37. > fault, or if the PSU is underpowered enough that bringing the CPUs online at
  38. > full power causes a power fault and immediate safety power off.
  39. >
  40. > Can you watch the internal lights while the system is booting, and see if the
  41. > power LED cluster immediately changes from green to orange as the system stops
  42. > responding over SSH?
  43. The IRC channel suspects this is not related to the problem. Regardless, I reply
  44. a few hours later with two videos showing the boot up process from power-out to
  45. display death, with the internal LEDs and the display output clearly visible.
  46. **July 4th**
  47. "Any progress on this issue?", I ask.
  48. **July 15th**
  49. "Hi guys, I'm still experiencing this problem. If you're unsure of the issue I
  50. would like to send the board back to you for diagnosis or a refund."
  51. **July 25th**
  52. > Sorry for the delay. Having senior support check out the videos.
  53. >
  54. > Thanks for writing back. We should have something for you by tomorrow during
  55. > the day.
  56. **July 31st**
  57. > Hi Drew.
  58. >
  59. > The videos are being reviewed this week. Thank you for sending them.
  60. >
  61. > Please stay tuned.
  62. **September 15th**
  63. No reply from support. I have since bought a little more hardware for
  64. self-diagnosis, namely the necessary pieces to connect to the two (or is it 3?)
  65. serial ports. I manage to get a log, which points to several failures, but none
  66. of them seem to be related to the problem at hand (they do indicate some network
  67. failures, which would explain why I can't log into the BMC over SSH for further
  68. diagnosis). And the getty is looping, so I can't log in on the serial console to
  69. explore any further.
  70. ---
  71. That was a week ago. Radio silence since.
  72. So, 10 months after I placed an order for a POWER9 machine, 3 months after I
  73. received it (without the RAM I purchased, no less), and over $2,500 invested...
  74. it's clear that buying the Blackbird was an expensive mistake. Maybe someday
  75. I'll get it working. If I do, I doubt the "support" team will have been
  76. involved. Currently my best bet seems to be waiting for some apparent staff
  77. member (the only apparent staff member) who idles in the IRC channel on Freenode
  78. and allegedly comes online from time to time.
  79. I'm not alone in these problems. Here are some (anonymized) quotes I've heard
  80. from others while trying to troubleshoot this on IRC.
  81. On support:
  82. > ugh, ddevault, yeah. [Blackbird ownership] has not been a smooth experience
  83. > for me, either.
  84. > my personal theory is that they have really bad ticket software that 'loses'
  85. > tickets somehow
  86. On reliability:
  87. > I've found openbmc's networking to be... a bit unreliable... maybe 20% of the
  88. > time it does not responed[sic]/does not respond fast enough to networking
  89. > requests.
  90. > yeah the vga handoff failing doesn't surprise me (other people here have
  91. > reported it). but the BMC not getting a DHCP lease is odd. (well maybe not
  92. > that odd if you look at the crumminess of the OpenBMC software stack...)
  93. So, yeah, don't buy from Raptor Computer Systems. It's too large and unwieldly
  94. to be an effective paper weight, either!
  95. ---
  96. **Erratta**
  97. *2019-09-24 @ 00:19 UTC*: Raptor has reached out and apologized for my support
  98. experience. We are discussing these problems in more detail now. They have also
  99. issued a refund for the unshipped RAM.
  100. *2019-09-24 @ 00:51 UTC*: Raptor believes the CPU to be faulty and is shipping a
  101. replacement. They attribute the delay to having to reach out to IBM about the
  102. problem, but don't have a satisfactory answer to why the support process failed.
  103. I understand it's being discussed internally.
  104. *2019-09-24 @ 13:08 UTC*:
  105. > After investigation, we are implementing new mechanisms to avoid support
  106. > issues like the one you experienced. We now have a self-serve RMA generation
  107. > system which would have significantly reduced your wait time, and are taking
  108. > measures to ensure that tickets are no longer able to be ignored by front line
  109. > support staff. We believe we have addressed the known failure modes at this
  110. > time, and management will be keeping a close eye on the operation of the
  111. > support system to ensure that new failure modes are handled rapidly.
  112. They've tweeted this about their new self-service RMA system as well:
  113. > We've made it easy to submit RMA requests for defective products on our Web
  114. > site. Simply go to your account, select the "Submit RMA Request" link, and
  115. > fill out the form. Your product will be warranty checked and, if valid, you
  116. > will receive an RMA number and shipping address!
  117. — @RaptorCompSys via [Twitter](https://twitter.com/RaptorCompSys/status/1176432946670186498)
  118. I agree that this shows positive improvements and a willingness to continue
  119. making improvements in their support experience. Thanks to Raptor for taking
  120. these concerns seriously. I hope to have a working Blackbird system soon, and
  121. will publish a follow-up review when the time comes.
  122. *2019-10-08 @ 22:30 UTC* A source quoted anonymously in this article asked me to
  123. remove their quote, after a change of heart. They feel that the attention this
  124. article has received has made their statement reach beyond the level of
  125. dissatisfaction they had with Raptor at the time.