logo

drewdevault.com

[mirror] blog and personal website of Drew DeVault git clone https://hacktivis.me/git/mirror/drewdevault.com.git

RaptorCS-redemption.md (5811B)


  1. ---
  2. date: 2019-10-10
  3. layout: post
  4. title: "RaptorCS's redemption: the POWER9 machine works"
  5. ---
  6. This is a follow-up to my earlier article, "[RaptorCS POWER9 Blackbird PC: An
  7. expensive mistake][previous]". Since I published that article, I've been in
  8. touch with Raptor and they've been much more communicative and helpful. I now
  9. have a working machine!
  10. ---
  11. Update Feb. 2024: Seems the improvements I asked for may not have stuck. [Buyer
  12. beware](https://posixcafe.org/blogs/2024/02/26/0/).
  13. ---
  14. [previous]: https://drewdevault.com/2019/09/23/RaptorCS-Blackbird-a-horror-story.html
  15. ![Picture of uname -sm showing "Linux ppcle64"](https://sr.ht/OTyo.jpeg)
  16. After I published my article, Raptor reached out and apologised for my
  17. experience. They offered a full refund, but I agreed to work on further
  18. diagnosis now that we had opened a dialogue[^1]. They identified that my CPU was
  19. defective and sent me a replacement, then we found the mainboard to be
  20. defective, too, and the whole thing was shipped back and replaced. I installed
  21. the new hardware into the datacenter today and it was quite pleasant to get up
  22. and running. Raptor assures me that my nightmares with the old board are
  23. atypical, and if the new board is representative of the usual user experience, I
  24. would have to agree. The installation was completely painless.[^2]
  25. [^1]: They did refund the RAM which was unfulfilled from my original order.
  26. [^2]: They did give me a little heart attack, however, by sending the replacement CPU to me in the same box I had returned the faulty CPU back to them with - a box which I had labelled "BAD CPU".
  27. However, I refuse to give any company credit for waking up their support team
  28. only when a scathing article about them frontpages on Hacker News. I told them I
  29. wouldn't publish a positive follow-up unless they also convinced me that the
  30. support experience had been fixed for the typical user as well. To this end,
  31. Raptor has made a number of substantive changes. To quote their support staff:
  32. > After investigation, we are implementing new mechanisms to avoid support
  33. > issues like the one you experienced. We now have a
  34. > [self-serve RMA generation system](https://twitter.com/RaptorCompSys/status/1176432946670186498)
  35. > which would have significantly reduced your wait time, and are taking measures
  36. > to ensure that tickets are no longer able to be ignored by front line support
  37. > staff. We believe we have addressed the known failure modes at this time, and
  38. > management will be keeping a close eye on the operation of the support system
  39. > to ensure that new failure modes are handled rapidly.
  40. They've tweeted this about their new self-service RMA system as well:
  41. > We've made it easy to submit RMA requests for defective products on our Web
  42. > site. Simply go to your account, select the "Submit RMA Request" link, and
  43. > fill out the form. Your product will be warranty checked and, if valid, you
  44. > will receive an RMA number and shipping address!
  45. — @RaptorCompSys via [Twitter](https://twitter.com/RaptorCompSys/status/1176432946670186498)
  46. They're also working on other improvements to make the end-user experience
  47. better, including [more content on the
  48. wiki](https://wiki.raptorcs.com/wiki/Main_Page), such as a [flowchart for
  49. dealing with common
  50. problems](https://wiki.raptorcs.com/wiki/Troubleshooting/Support_Request_Checklist).
  51. Thanks to Raptor for taking the problem seriously, quickly fixing the problems
  52. with my board, and for addressing the systemic problems which led to the
  53. failure of their support system.
  54. On the subject of the working machine, I am quite impressed with it so far.
  55. Installation was a breeze, it compiles the kernel on 32 threads from spinning
  56. rust in 4m15s, and I was able to get KVM working without much effort. I have
  57. christened it "flandre"[^3], which I think is fitting. I plan on bringing it up
  58. as a build slave for builds.sr.ht in the coming weeks/months, and offering
  59. ppc64le builds on Sourcehut in the near future. I have another board which was
  60. generously donated by another Raptor customer[^4], which arrived last week and
  61. that I hope to bring up and use for testing Wayland before introducing it to the
  62. Sourcehut fleet.
  63. [^3]: Sourcehut virtual machines are named after their purpose, but our physical servers are named after [Touhou](https://en.wikipedia.org/wiki/Touhou_Project) characters.
  64. [^4]: This happened prior to any of the problems with the first machine.
  65. ---
  66. P.S. For those interested in more details of the actual failures:
  67. This machine is so badly broken that it would actually be hilarious if the
  68. manufacturer had been more present in the troubleshooting process. I think the
  69. best way to sum it up is "FUBAR". Among problems I encountered were:
  70. - The CPU experiences a "ZCAL failure" (???)
  71. - The BMC (responsible for bringing up the main CPU(s)) had broken ethernet,
  72. making login over SSH impossible
  73. - The BMC's getty would boot loop, making login over serial impossible
  74. - The BMC's u-Boot would boot loop if the TX pin on the serial cable was plugged
  75. in, making diagnosing issues from that stage impossible
  76. - petitboot's ncurses output was being piped into a shell and executed (what the fuck?)
  77. In the immortal words of James Mickens, "I HAVE NO TOOLS BECAUSE I HAVE
  78. DESTROYED MY TOOLS WITH MY TOOLS." A staff member at Raptor tells me:
  79. "Your box ended up on my desk [...] This is easily the most broken board I've
  80. seen, ever, and that includes prototypes. This will help educate us for a while
  81. to come due to the unique nature of some of the faults."
  82. Not sure what can cause such an impressive cacophony of failures, but it's so
  83. catastrophic that I can easily believe that this is far from typical. The
  84. hardware is back in Raptor's hands now, and I would be interested to hear about
  85. their insights after further diagnosis.