How SingleStore Saved Otherboard

Editor’s Note: This is a very technical post. You may find it boring 😴. Or it may blow your mind 🤯. Either way, we’re glad you’re here.

As we mentioned in our 90-day recap post, we hit some real infrastructure woes far earlier than we hoped we might.

was this a harbinger of things to come? yes, dear reader, it was!

The truth is, we thought we’d be able to manage scale via RDS for maybe the first 3-9 months, give or take. There are lots of thoughts around scaling in the software development world. The general ends of the spectrum seem to be:

  1. Build for scale! Expect insanity at the very beginning!
  2. Worry about scale when you get to scale. Great problem to have, but 99% of people won’t really ever have that problem.

I’d like to say my mentality was somewhere measured, maybe in the middle of that spectrum. The problem at hand was that our calculations were off by orders of magnitude.

When You Can’t Make it up In Volume

Ever hear the story about the guy selling $20 bills out of his trunk for $15? A savvy spendthrift comes up and buys a 10-pack for $150 and says, “My guy! How are you ever going to make money selling $20 bills for $15?” To which the shady character who was actually probably just laundering counterfeit bills said, “We’ll make it up in volume!”

It’s a silly story, but it outlines a way of thinking that founders like myself can often get caught up in. We think we’ll hit some significant infrastructure costs up front, possibly, but we can just keep bringing on more customers, and costs will stay relatively steady. We’ll make it up in volume!

The problem is that, at least in our case, making it up in volume was not tracking as a feasible reality. Here’s a few screenshots from our Slack channel during December. For reference, we launched anticipating infrastructure costs of around $1,200 – $1,800.

cooooool cool cool, double what we were expecting. no big deal.
couple days later, forecasting almost double the previous forecast. cooooooooool cool cool.
At this point, we’re two weeks post-launch, I’m sweating bullets.
the sweating of bullets has now, somehow, morphed into sweating grenades. also, the AWS experts? not helpful at all.
After a few weeks of working on it, we were making slow progress. We peaked at nearly $1,000 per day in costs, with a forecasted monthly infrastructure bill of over $20,000.
Where we’re at today. Average daily costs around $25-$30. A far cry from the terrifying $1,000+/day numbers we were headed for.

The Problem

Hemorrhaging money with no end in sight, with a clarity on what and where we were losing money, and burgeoning clarity on why it was happening…all we were really left to figure out was how to rightly understand the context and root cause of the problem and fix it accordingly.

We saw the same screenshots you all see above and recognized a combination of factors at play: EC2, Lambda, RDS, and SQS. For the uninitiated (and to be clear, I consider myself barely initiated): Lambda is the “serverless” (aka a server, just not mine) application that runs everything, EC2 is sort of the “serverless” hardware that we’re actually sitting on, and SQS is the fancy thing in the background processing all the background heavy-lifting work without slowing down the application itself. RDS is just Amazon’s database service (one of them, anyway. Some argue that all Amazon services are database services 🤣). AWS Experts, please feel free to point out the areas in which I have dummified these explanations 🤣.

With the combination of issues at play, there was no single fix that was going to get us where we needed to be. But, after a deep-dive into how AWS calculates ingress and egress between each of these services, as well as cost structures of external versus internal traffic – we knew that our strategy needed to include (at least) three legs, and one of those was vital to start with:

  1. Move away from RDS entirely to SingleStore
  2. Move away from SQS to Redis
  3. Move away from EC2 to EKS*

* I’m told this is moderately convoluted and mis-leading as EC2 on EKS is a closer representation to what we’re doing now. As I reminder, I have no idea what I’m doing.

The Solution

So given that we had some measure of clarity forming around the problems at hand, as well as likely solutions to consider – the next step was planning out the order and priority of execution.

Using SingleStore meant radically and drastically changing a ton of how our data modeling and architecture worked. You might remember from my snarky captions above, the AWS Solutions Architects were approximately as helpful as a brick to the face. The SingleStore engineers were the exact opposite. No, not a brick to the back of the head – they were, in a word, amazing.

The SingleStore account executives, engineering team, and really everyone we’ve connected with on their team have been absolute experts. From helping us understand SingleStore more deeply, to helping us identify the right ways to structure complex queries within SingleStore to minimize CPU usage, they’ve been legendarily good.

It took us roughly 4 weeks to complete the initial migration to SingleStore (fun fact, the migration was officially completed on December 22nd, my birthday. if you’re ever curious what to get me for my birthday – stable and cost-efficient infrastructure is a great gift). After that, it took another 4 weeks or so of tweaking to get it pretty good. Of course, even now, we continue to tweak and improve across both AWS and SingleStore now to keep our costs under control and keep things running smoothly for our customers.

What about the Redis and EKS migrations? What are those and why were they important?

Moving from EC2/SQS to EKS/Redis

Again, possible misnomers and technical misunderstandings on my part aside – this was a vital move primarily for cost-reliability. With our costs spiraling out of control unpredictably, we needed something for our servers that was going to be a controlled, predictable cost.

Moving to EKS (Elastic Kubernetes Service – if you haven’t heard of Kubernetes, it’s just a thing…that does…stuff. I promise if you Google it and learn about it, it will not make your life any better) allowed us to essentially control, restrain, and predictably scale our resources by containerizing our servers. I’m nerding out here above my weight-class, so I may invite our engineers to write more on this later. But in terms of dollars and cents, migrating to EKS + Redis got our computation costs down to a fixed rate of around $280/month, rather than massively variable rates that were way higher than that.

At the end of the day, the EKS migration and the Redis migration were primarily focused on controlling costs, but in the days since those migrations, we’ve noticed even more improved performance for our customers inside Otherboard, which is the ultimate win.

Thanks for hanging with us this far! We’re constantly learning, growing, and improving Otherboard as a service for our customers, but also as a sustainable business. We’ll continue to share what we learn along the way. 🚀 🚀 🚀

Editor’s Note: Endless thanks to Jack Ellis of Fathom Analytics. His transparency, educational resources, and trailblazing has inspired, and continues to inspire, us as we build Otherboard into the world’s best content management solution for WordPress blogs.

2 comments

Leave a comment

Your email address will not be published. Required fields are marked *