« Do you have too many microservices? - Five Design Attributes that can Help | Main | Stuff The Internet Says On Scalability For March 30th, 2018 »

How ipdata serves 25M API calls from 10 infinitely scalable global endpoints for $150 a month

This is a guest post by Jonathan Kosgei, founder of ipdata, an IP Geolocation API. 

I woke up on Black Friday last year to a barrage of emails from users reporting 503 errors from the ipdata API.

Our users typically call our API on each page request on their websites to geolocate their users and localize their content. So this particular failure was directly impacting our users’ websites on the biggest sales day of the year. 

I only lost one user that day but I came close to losing many more.

This sequence of events and their inexplicable nature — cpu, mem and i/o were nowhere near capacity. As well as concerns on how well (if at all) we would scale, given our outage, were a big wake up call to rethink our existing infrastructure.

Our Tech stack at the time

  • Japronto Python Framework 
  • Redis
  • AWS EC2 nodes
  • AWS Elastic Loadbalancers
  • Route53 Latency Based Routing 

I had run tests on several new, promising Python micro-frameworks. 

Choosing between `aiohttp`, `sanic` and `japronto` I settled on Japronto after benchmarking the 3 using https://github.com/samuelcolvin/aiohttp-vs-sanic-vs-japronto and finding it to have the highest throughput.

The API ran on 3 EC2 nodes in 3 regions behind ELB loadbalancers with Route53 latency based routing to route requests to the region closest to the user to ensure low latency.

Choosing a new Tech stack

An Example Weather API using our current stack


Around this time I started to seriously look into using API Gateway with AWS Lambda given their:

  1. Favorable pricing — about $3.50 per million on API Gateway and $0.20 per million for AWS Lambda. 
  2. Infinite scale and high throughput — the account limit on API Gateway is 10, 000 requests per second or about 864M calls daily. A limit that is possible to lift by opening a support request.

This also made it economically viable to have endpoints in numerous AWS regions to provide low latencies to all our users all over the globe.

Designing a multi-regional API Gateway API

There were a number of architectural challenges that had be solved to make this viable.

  1. Each lambda function in each region needed to be able to lookup usage data in a database in the same region to minimize latency
  2. I needed to figure out a way to determine the number of API calls made by each IP Address, Referer and API Key.
  3. A means to sync the usage data in all regions. For example if Route53 sent 10 000 requests to our Sydney endpoint then decided to send the next 50 000 to our Seoul endpoint (depending on which had the least network latency at that point in time). Each lambda function would need to know that the user had made 60 000 requests in total to properly handle rate limiting.
  4. Authorization — API Gateway provides usage plans and API key generation and allows you to link an API key to a usage plan. With the added advantage that you don’t get charged for requests users make beyond their quotas. However I couldn’t use this because it was important to me to provide a no sign-up, no credit card free tier.

With quite a bit of work, I was able to solve these problems in creative ways. 

Accessing the usage data locally (for each lambda function)

The obvious solution for this was to use Dynamodb, it was cost effective at scale and fast! With the first 200M requests per month being free. 

Dynamodb also provides consistently low read latencies of 1–2 ms.

And this can be sped up into the microsecond range with Dynamodb Accelarator (DAX).

DAX takes performance to the next level with response times in microseconds for millions of requests per second for read-heavy workloads.

Collecting usage data for all identifiers

The next challenge was how to count in real time the number of requests made per IP address, Referer or API key. 

The simplest most direct way to do this would be to update a count in a dynamodb table on each call.

However this would introduce database writes on each call to our API, potentially introducing significant latency. 

I was able to find a simple and elegant solution to this:

  1. First, print a log (a JSON object) with all the request identifiers on each request. That is the IP address, Referer and API key if present. Really just; ```print(event) ``` 
  2. Add a Cloudwatch Subscription Filter to the Cloudwatch Log Stream of each Lambda function in each region and push all the logs into a Kinesis stream. This would allow me to process log events from every region in a central place. I chose Kinesis over SQS (Amazon’s Simple Queue Service) because of the ability to play back events. SQS deletes the event as soon as a consumer reads it. And I wanted the ability to be able to recover from node failures and data loss. 
  3. Read from the Kinesis stream and update a Local Dynamodb instance with the usage data
  4. Use the Dynamodb Cross Regional Replication Library to stream all changes to my local dynamodb instance to all the tables in all regions in real time.

Authenticating Requests

I handle this by replicating keys to every region on signup, so that no matter what endpoint a user hits, the lambda function they hit can verify their key by checking in it’s local Dynamodb table within a millisecond. This also stores the user’s plan quota and can in a single read verify the key and if it exists get the plan quota to compare usage against and determine whether to accept or reject the request.

How this has fared 

Today we serve 25M API calls monthly, about 1M calls daily. 

Majority of them in under 30ms, providing the fastest IP Geolocation Lookup over SSL in the industry. 


Our Status Page

Latency is pretty much the biggest reason developers shy from using third party APIs for GeoIP lookups.

However our low latencies and redundant global infrastructure are slowly drawing large businesses to our service. 



  1. Cloudwatch can be surprisingly costly — and not log storage — we only store cloudwatch logs for 24hrs. Alarms, metrics and cloudwatch requests can really add up. 
  2. On API Gateway the more requests you get the lower your latencies will be due to fewer cold starts, because of this I’ve seen latencies as low as 17ms in our busiest region (Frankfurt) to 40ms in our less busy regions such as Sydney. 
  3. Dynamodb is fast and will cost you less than you think (or not, see https://segment.com/blog/the-million-dollar-eng-problem/). I initially thought I’d get charged per the number of RCUs and WCUs I’d provision. However billing seems to be only par usage, so if you provision 1000 RCUs and 1000 WCUs but only use 5 RCUs and WCUs you’ll only get charged for your usage. This aspect of Dynamodb pricing was a bit tough to wrap my head around at the beginning.
  4. Increasing your lambda RAM can halve your execution time and make response times more consistent (as well as double your costs!)
  5. Kinesis has proven to be very reliable under high throughput. Relaying all our log events for processing in near real time.
  6. Local Dynamodb is only limited by your system resources, which makes it great for running table scans or queries (for example when generating reports) that would otherwise be expensive to do on AWS’s Dynamodb. Keep in mind that Local Dynamodb is really just Dynamo wrappings around SQLite :). It’s useful and convenient for our usecase but might not be so for you.


  • AWS announced Dynamodb Global tables at Re:invent last year which syncs all writes in all tables — across regions — to each other. We’re currently not moving to this as it’s only available in 5 regions.
  • Amazon also introduced Custom Authorizers of the REQUEST type. Which would potentially allow you to rate limit by IP Address as well as any header, query or path parameter.


On HackerNews

Reader Comments (17)

Your DynamoDB cost surprise was because by default 'auto scaling' to read/write volume is enabled. As long as you don't have a large spike, you should never see a provision error...

April 2, 2018 | Unregistered CommenterDaniel Greene

I'm confused regarding this statement:

I initially thought I’d get charged per the number of RCUs and WCUs I’d provision. However billing seems to be only par usage, so if you provision 1000 RCUs and 1000 WCUs but only use 5 RCUs and WCUs you’ll only get charged for your usage. This aspect of Dynamodb pricing was a bit tough to wrap my head around at the beginning.

The pricing page (https://aws.amazon.com/dynamodb/pricing/) states that you are charged for the throughput you provision

Maybe you are not being charged because you are still within the free tier?

April 3, 2018 | Unregistered CommenterCaDs

Hey Cads,

I completely understand the confusion and I thought the same as you.

However I'm pretty sure we're not on the free tier, because we are getting charged for Dynamodb.

And we're only being charged for our usage and nowhere near the provisions on the table.

It seems counter to what Amazon's pricing page says but it's what I've seen reflected on my AWS bill.

April 3, 2018 | Unregistered CommenterJonathan

Thanks for the interesting article. I have one question regarding the local DynamoDB: Where do you run this? As EC2 instance?

April 3, 2018 | Unregistered CommenterTobi

Hi Tobi,

Thanks! I actually run it on an Azure node. It could however run on DO or EC2 as a spot instance.

April 4, 2018 | Unregistered CommenterJonathan

The graph is not sincere, obviously. There's never this big a difference. In this case it's probably because the author is just testing pipelining vs no pipelining. Also he limited the go prog to 1 cpu. Also also the repo does not exists anymore. I call bullshit on the whole "benchmark".

BTW, Japronto is mostly C, check the code.

(I'm biased towards Go, but it's not like node and others can't handle themselves)

April 4, 2018 | Unregistered CommenterØyvind

I remember there were quite a few similar comments on the medium article see https://medium.freecodecamp.org/million-requests-per-second-with-python-95c137af319

Also I was able to find this other benchmark that finds japronto to be second to Rust

April 5, 2018 | Unregistered CommenterJonathan

25M requests a month is 10 request per second...

What about peak requests per second?

April 9, 2018 | Unregistered CommenterEsko

Nicely written article. Thanks for sharing your experience. I have question regarding the following part :

"First, print a log (a JSON object) with all the request identifiers on each request. That is the IP address, Referer and API key if present. Really just; ```print(event) ``` "

Just wondering why didn't you push these messages directly to kinesis? Why logs first?

April 10, 2018 | Unregistered CommenterKaivalya

Esko, API Gateway can handle peaks of 10k req/s

April 11, 2018 | Unregistered CommenterJonathan

Kaivalya, that would have introduced latency for users, printing the log was the cheapest (in terms of latency) and simplest solution.

April 11, 2018 | Unregistered CommenterJonathan

>>SQS deletes the event as soon as a consumer reads it.
This is not true, once you read a message from SQS, it is not visible to anyone till visibility timeout. You need to acknowledge this read to SQS so it can delete the message. What we generally do is process the message and then only do ack as that way if processing instance goes down we still have our message intact. Just need to tune visibility timeout to make sure it gives sufficient time.

Great article. Thanks for sharing this with community.

April 15, 2018 | Unregistered CommenterMitesh Sharma

Hi, how did you handle the 1,000 Lambda calls per second limit? Architects at AWS advised us not to use Lambda on most projects that have high usage scenarios with high concurrency - they said it's better only for smaller to medium size projects and it does not have infinite scale like you said - one big problem we were told about by the AWS team is will exhaust all your available IP addresses within your VPC when running at max. Calls beyond the 1,000 limit also just get rejected and your clients get errors.

April 18, 2018 | Unregistered CommenterJohn

To calculate the number of our concurrent executions we use the formula;

events (or requests) per second * function duration

As documented at https://docs.aws.amazon.com/lambda/latest/dg/scaling.html
Since this post went live we've averaged 2M API calls a day, about 12 per second.Our maximum lambda invocation time is 20ms, using those values in the above formula;

12 * 0.020

We get a concurrency level of 0.24.

This means we could grow to serve 50, 000 requests a seconds with the current 1000 lambda invocation limit, about 4.32 Billion requests a day.

April 22, 2018 | Unregistered CommenterJonathan

Are there multille subscribers to the kinesis stream events ot a single subscriber?
Just wondering why a cross regional dynamo db replication if all data is stored in one central(local) dynamo db instance.

July 23, 2018 | Unregistered CommenterTej

So you are paying 150$ for 9 req/s ?

January 29, 2019 | Unregistered Commentertheo

For license key check, have you thought of using a Bloom or Cuckoo filter? You can hold the filter in memory and avoid a database call.

March 4, 2019 | Unregistered CommenterDouglas

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>