advertise
« Stuff The Internet Says On Scalability For May 9th, 2014 | Main | The Quest for Database Scale: the 1 M TPS challenge - Three Design Points and Five common Bottlenecks to avoid »
Wednesday
May072014

Update on Disqus: It's Still About Realtime, But Go Demolishes Python

Our last article on Disqus: How Disqus Went Realtime With 165K Messages Per Second And Less Than .2 Seconds Latency, was a little out of date, but the folks at Disqus have been busy implementing, not talking, so we don't know a lot about what they are doing now, but we do have a short update in C1MM and NGINX by John Watson and an article Trying out this Go thing.

So Disqus has grown a bit:

  • 1.3 billion unique visitors
  • 10 billion page views
  • 500 million users engaged in discussions
  • 3 million communities
  • 25 million comments

They are still all about realtime, but Go replaced Python in their Realtime system:

  • Original Realtime backend was written in a pretty lightweight Python + gevent.
  • The realtime service is a hybrid of CPU intensive tasks + lots of network IO. Gevent was handling the network IO without an issue, but at higher contention, the CPU was choking everything. Switching over to Go removed that contention, which was the primary issue that was being seen.
  • Still runs on 5 machines Nginx machines. 
    • Uses NginxPushStream, which supprts EventSource, WebSocket, Long Polling, and Forever Iframe.
    • All users are connected to these machines.
    • On a normal day each machine sees 3200 connections/s, 1 million connections, 150K packets/s TX and 130K packets/s RX, 150 mbits/s TX and 80 mbits/s RC, with <15ms delay end-to-end (which is faster than Javascript can render a comment)
    • Had many issues with resource exhaustion at first. The configuration for Nginx and the OS are given that help alleviate the problems, tuning them to handle a scenario with many connections moving little data.
  • Ran out of network bandwidth before anything else. 
    • Using 10 gigabit network interface cards helped a lot. 
    • Enabling gzip helped a lot, but Nginx preallocates a lot of memory per connection for gzip, but since comments are small this was overkill. Ruducing Nginx buffer sizes reduced out of memory problems.
  • As message rates increased, at peak processing 10k+ messages per second, the machines maxed out, and end-to-end latency went to seconds and minutes in the worst case.
  • Switched to Go. 
    • Liked Go because of its performance, native concurrency, and familiarity for Python programmers.
    • In only a week a replacement system was built with impressive results:
      • End-to-end latency is on average, less than 10ms.
      • Currently consuming roughly 10-20% of available CPU. A huge reduction.
    • Node was not selected because it does not handle CPU intensive tasks well
    • Go does not directly access the database. It consumes a queue from RabbitMQ and publishes to the Nginx frontends.
    • A Go framework is not being used. This is a tiny component and the rest of Disqus is still Django.
  • They wanted to use resources better, not add more machines:
    • For the amount of work that was being done, we didn't want to horizontally scale more. Throwing more and more hardware at a problem isn't always the best solution. In the end, having a faster product yields its own benefits as well.

Related Articles

Reader Comments (6)

Interested in the comment, "Go does not directly access the database. It consumes a queue from RabbitMQ and publishes to the Nginx frontends" but I'm having a hard time picturing exactly what's going on there. Would it be possible to flesh that out a bit more?

May 7, 2014 | Unregistered CommenterRichard

Richard: The linked talk is probably your best bet for more detail, but I think the idea is that Disqus' realtime system gets a new comment in via RabbitMQ and sends notifications out via nginx--it doesn't hold long-term state like a database. Since we're talking about, essentially, the notification system here, that makes some sense.

May 7, 2014 | Unregistered CommenterRandall

I am with Richard here. Curious about what happens in case of failure.

May 8, 2014 | Unregistered CommenterVictor

So, you replaced Django (ORM) with a stream queue and your conclusion is that Go outperforms Python ? How is it even related to the programming language ?

May 9, 2014 | Unregistered CommenterVelko

usually when people change languages, they also change the architecture of the app

but it's hard to see what % of the speedup came from just the language

May 9, 2014 | Unregistered Commenterriksi

@Velko, no that is not the case. Find out more about what parts changed by reading the post about their previous setup.

May 11, 2014 | Unregistered Commentermf

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>