Wednesday, March 17, 2021

Why We Didn't Use AWS Lambda: EC2 Fits Better (For Our Golang App)

Serverless - AWS Lambda vs. Microsoft Azure Functions vs ...This is going to generate some hate.  But technology is complicated.  You *will* be able slice the numbers differently with different assumptions. We are laying out how *we* chose to make decisions for us.  As a self-funded startup we had to make some early choices (without the luxury of staff to do a 6 month cost-benefit analysis).   This is how we spent an afternoon deciding on infrastructure, using logic guided by experience.

On our recent post about how and why we made decisions to avoid docker (https://launchyourapp.meezeeworkouts.com/2021/03/why-we-dont-use-docker-we-dont-need-it.html), one of the interesting questions that came up was, did we consider serverless options.

Of course we did!  We decided against it for now. Here's why.

Disclaimer #2: there is a lot of variability in bench-marking. Especially against a virtual machine like an amazon EC2 t2.micro instance, where it's shared with a larger machine, and that larger machine is probably over-subscribed to multiple clients on the assumption they won't all pound it simultaneously.  What we mostly wanted to do was validate that our app wouldn't fall over with unexpectedly high load, and that we had room for tuning (and growth) if need be.

Our benchmarks showed that hitting our largest web service, we could hit 50tps fairly easily on an amazon t2.micro instance (the free for 12 months version).  The specs on a t2.micro instance are:

AWS EC2 Offerings 

When we used siege to throw 50tps against our largest web service, we saw that  CPU usage spiked to about 10% of CPU, which is our baseline sustained limit on the free EC2 tier (yes, top is horrible for accurate benchmarks -- again -- just looking for back of the envelope here).

Output from top showing 10% CPU usage
 

Most of our CPU time was actually in our nginx reverse proxy (8.3%), and only 1.3% of CPU time was spent in our actual golang web service.  Yes, golang is fast and efficient.  Yes, nginx was doing all the TLS.  With the 90% of the CPU available for bursting performance, this made us happy.  This is so far above what we anticipate needing for right now, that the only value is to make sure we have options using the Last Responsible Moment philosophy  -- are we depriving ourselves of good options later?  Nope! Check. Move on.

  • Option 1: remove nginx, put our SSL certificates directly in golang. // Free except for labor.  But we might be able to improve performance by an order of magnitude -- say 500 tps and still stay within that 10% baseline.
  • Option 2: increase to the next tier up. // Costs AWS money
  • Option 3: put our DR server into load balanced rotation. // Costs AWS money and/or money somewhere else for a load balancer.
  • Option 4: deploy to Lamba 
  • Option 5: deploy to some *really* fast hardware like a Raspberry pi. ;)

Since we are a self-funded start-up, we go with option 0 - ignore all of these for now because we made a good decision to go with t2.micro for free + golang (+DR and DNS failover)

So, why didn't we go with aws lambda?

We considered it.  Those 1,000,000 transactions a month are sure attractive.  But if we do basic math:

  • Seconds in a day: 86,400
  • Days in a month:  30 (good enough for back of the envelope)
  • Seconds per month: 2,592,000

So that means that at the point we hit 0.38 transactions per second we have to start paying AWS for lambda.  (And that's not even counting CPU time.. there's a convoluted surcharge for lambda CPU time.)

If we can hit 50 transactions per second on a t2.micro instance,  that means we can do ~131 (50/.38) times more transactions for free per month than we can with aws lamba. Without even performance tuning  Yes, aws lambda is still pretty cheap.  But free is better than cheap, isn't it?  With this math we figure we can scale our workout app to 10% of the US population using a single t2.micro instance, and not have to worry about counting calories (or compromising our code readability to optimize down our number of transactions to fit in the lambda billing model).

Or to put it another way without rounding with the decimal division, we can do the following transaction count for free monthly as so:

So yes, we can do 1 million transactions per month for free on lambda, or 100 million for free on a t2.micro instance.  With plenty of burstable overhead.

YMMV, especially if you use something like Java (which will have a hard time running in 1GB of RAM). Or if you have database needs.

Conclusion: modern wisdom seems to be that if you want to scale, you need to design from the ground up and invest heavily in "cloud-ready" technologies.  We're running in the cloud. With enough scale to grow our business enormously.  Using a very simple stack with few moving parts.   Because the more moving parts, the more parts to break. This makes us happy.

Point #1: modern computers are amazingly fast. Blindingly fast. We would have killed for the performance of a single t2.micro instance 20 years ago to shove strings across http(s).

Point #2: yes. You can do the math differently for non-free t2.micro instances vs lambda.  We understand that ultimately you hit a performance wall on a single server and lambda will scale more.  We don't have that problem and probably won't.  Are you sure you do or will?  100 million transactions per month.  Think about that number.  3 transactions that month for 10% of the US population.  That's a *huge* number on a single server unless you are one of a handful of companies.

Point #3: in one of our past lives, we spent 4 man-years developing a system for managing internal chargebacks for mainframe CPU time for a large US bank.  Managing and forecasting CPU time gets *expensive*.  Once you tie yourself to usage based billing systems, they have a tendency to metastasize in unexpected ways.

Point #4: we are grumpy old developers. We like simple.

EDIT: updated after some astute redditors pointed out AWS burstable limitations: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html.  In short, a t2.micro instance will be throttled to only be able to use 10% of your 1 vCPU if you sustain load on it.  So our original calculations were off by an order of magnitude. Yay, we learned something new.

If you found this interesting, please comment below.  Or give our fitness app a try (iOS only):

https://meezeeco.com/getapp

Or if you're an enterprise and you really like what we have to say, you could bring us on to help you with your devops and development practices. Our email is engagement at meezeeco.com.






 

No comments:

Post a Comment

Why We Didn't Use AWS Lambda: EC2 Fits Better (For Our Golang App)

This is going to generate some hate.  But technology is complicated.  You *will* be able slice the numbers differently with different assump...