advertise
Wednesday
Jul292015

A Well Known But Forgotten Trick: Object Pooling

This is a guest repost by Alex Petrov. Find the original article here.

Most problem are quite straightforward to solve: when something is slow, you can either optimize it or parallelize it. When you hit a throughput barrier, you partition a workload to more workers. Although when you face problems that involve Garbage Collection pauses or simply hit the limit of the virtual machine you're working with, it gets much harder to fix them.

When you're working on top of a VM, you may face things that are simply out of your control. Namely, time drifts and latency. Gladly, there are enough battle-tested solutions, that require a bit of understanding of how JVM works.

If you can serve 10K requests per second, conforming with certain performance (memory and CPU parameters), it doesn't automatically mean that you'll be able to linearly scale it up to 20K. If you're allocating too many objects on heap, or waste CPU cycles on something that can be avoided, you'll eventually hit the wall.

The simplest (yet underrated) way of saving up on memory allocations is object pooling. Even though the concept is sounds similar to just pooling objects and socket descriptors, there's a slight difference.

When we're talking about socket descriptors, we have limited, rather small (tens, hundreds, or max thousands) amount of descriptors to go through. These resources are pooled because of the high initialization cost (establishing connection, performing a handshake over the network, memory-mapping the file or whatever else). In this article we'll talk about pooling larger amounts of short-lived objects which are not so expensive to initialize, to save allocation and deallocation costs and avoid memory fragmentation.

Object Pooling

Click to read more ...

Monday
Jul272015

Algolia's Fury Road to a Worldwide API Part 3

The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't built in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

The first blog post of this series focused on our early days in beta and the second post on the first 18 months of the service, including our first outages. In this last post, I will describe how we transformed our "startup" architecture into something new that was able to meet the expectation of big public companies.

Step 11: February 2015

Launch of our Synchronized Worldwide infrastructure

Click to read more ...

Friday
Jul242015

Stuff The Internet Says On Scalability For July 24th, 2015

Hey, it's HighScalability time:


Walt Disney doesn't mouse around. Here's how he makes a goofy business plan.

 

  • 81%: AWS YOY growth; 400: hours of video uploaded to YouTube EVERY MINUTE; 9,000: # of mineable asteroids near earth; 1,400: light years to Earth's high latency backup node; 10K: in the future hard disks will be this many times faster 
  • Quotable Quotes:
    • @BenedictEvans: Chinese govt: At the end of 2014 China had 112.7 billion static webpages and 77.2 billion dynamic webpages. They used 9,310,312,446,467 KB
    • Michael Franklin (AMPLab): This is always a pendulum where you swing from highly distributed to more centralized and back in. My guess is there’s going to be another swing of the pendulum, where we really need to start thinking about how do you distribute processing throughout a wide area network.
    • Sherlock Holmes: Singularity is almost invariably a clue. 
    • @jpetazzo: OH: "In any team you need a tank, a healer, a damage dealer, someone with crowd control abilities, and another who knows iptables"
    • Jeff Sussna: Ultimately, the impact of containers will reach even beyond IT, and play a part in transforming the entire nature of the enterprise. 
    • @CarlosAlimurung: Impressive.  The number of #youtube channels making six figures grew by 50%. 
    • harlowja: Overall, no the community isn't perfect, yes there are issues, yes it burns some people out, but software isn't rainbows and butterflies after all.
    • werner: BTW nobody wants eventual consistency, it is a fact of live among many trade-offs. I would rather not expose it but it comes with other advantages ...
    • @VideoInkNews: We’re focused on our top three priorities – mobile, mobile and mobile, said @YouTube CEO @SusanWojcicki #VidCon2015 #keynote
    • Ivan Pepelnjak: Use a combination of MPLS/VPN and Internet VPN, or Internet VPN with 3G backup. Use multiple access methods, so the cable-seeking backhoe doesn’t bring down all uplinks.
    • @randybias: Repeat after me: containers do little to enable application portability.  If you want portability use a PaaS.  PaaS != Containers.
    • To see even more quotes please click through to see the rest of the post.

  • Can't we all just get along? And by "we" I mean humans and robots. Maybe. Inside Amazon shows by example how one new utopian community is bridging the categorical divide. Forget all your skepticism and technopanic, humans and robots can really work together in a highly efficient system.

  • A Brief History of Scaling LinkedIn. Not so brief actually. Lots of really good details. They of course started off with a monolith and ended up with a service oriented architecture. One of the most interesting ideas is the super block: groupings of backend services with a single access API. This allows us to have a specific team optimize the block, while keeping our call graph in check for each client.

  • If you want to move at the speed of software doesn't your datacenter infrastructure have to move at the same speed? Network Break 45 from Packet Pushers talks about an open source virtual software router, CloudRouter, running the latest release of OpenDaylight's SDN controller and ONOS. The idea is to make a dead simple router you can just instantiate as needed. Greg Ferro makes the point that if you don't have to care if you are starting 100 or 1000 virtual routers it changes how you go about building infrastructure. Running a Cisco Router, and F5 load balancer, and a virtual firewall, how much will it cost to spin up virtual datacenters for 100s of developers? How long will it take? How much will it cost? How does it even work? 

Click to read more ...

Wednesday
Jul222015

Architecting Backend for a Social Product

This is aimed towards taking you through key architectural decisions which will make a social application a true next generation social product. The proposed changes addresses following attributes; a) availability b) reliability c) scalability d) performance and flexibility towards extensions (not modifications)

Goals

a) Ensuring that user’s content is easily discoverable and is available always.

b) Ensuring that the content pushed is relevant not only semantically but also from user’s device perspective.

c) Ensuring that the real time updates are generated, pushed and analyzed.

d) Eye towards saving user’s resources as much as possible.

e) Irrespective of server load, user’s experience should remain intact.

f) Ensuring overall application security

In summary we want to deal with an amazing challenge, where we must deal with a mega sea of ever expanding user generated contents, increasing number of users, and a constant stream of new items, all while ensuring an excellent performance. Considering the above challenge it is imperative that we must study certain key architectural elements which will influence the over system design. Here are the few key decisions & analysis.

Data Storage

Click to read more ...

Tuesday
Jul212015

Sponsored Post: Redis Labs, Jut.io, VoltDB, Datadog, Tumblr, Power Admin, MongoDB, SignalFx, InMemory.Net, Couchbase, VividCortex, MemSQL, Scalyr, AiScaler, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Make Tumblr fast, reliable and available for hundreds of millions of visitors and tens of millions of users.  As a Site Reliability Engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems. Apply here now! 

  • At Scalyr, we're analyzing multi-gigabyte server logs in a fraction of a second. That requires serious innovation in every part of the technology stack, from frontend to backend. Help us push the envelope on low-latency browser applications, high-speed data processing, and reliable distributed systems. Help extract meaningful data from live servers and present it to users in meaningful ways. At Scalyr, you’ll learn new things, and invent a few of your own. Learn more and apply.

  • UI EngineerAppDynamics, founded in 2008 and lead by proven innovators, is looking for a passionate UI Engineer to design, architect, and develop our their user interface using the latest web and mobile technologies. Make the impossible possible and the hard easy. Apply here.

  • Software Engineer - Infrastructure & Big DataAppDynamics, leader in next generation solutions for managing modern, distributed, and extremely complex applications residing in both the cloud and the data center, is looking for a Software Engineers (All-Levels) to design and develop scalable software written in Java and MySQL for backend component of software that manages application architectures. Apply here.

Fun and Informative Events

  • Surge 2015. Want to mingle with some of the leading practitioners in the scalability, performance, and web operations space? Looking for a conference that isn't just about pitching you highly polished success stories, but that actually puts an emphasis on learning from real world experiences, including failures? Surge is the conference for you.

  • Your event could be here. How cool is that?

Cool Products and Services

  • MongoDB Management Made Easy. Gain confidence in your backup strategy. MongoDB Cloud Manager makes protecting your mission critical data easy, without the need for custom backup scripts and storage. Start your 30 day free trial today.

  • In a recent benchmark for NoSQL databases on the AWS cloud, Redis Labs Enterprise Cluster's performance had obliterated Couchbase, Cassandra and Aerospike in this real life, write-intensive use case. Full backstage pass and and all the juicy details are available in this downloadable report.

  • Real-time correlation across your logs, metrics and events.  Jut.io just released its operations data hub into beta and we are already streaming in billions of log, metric and event data points each day. Using our streaming analytics platform, you can get real-time monitoring of your application performance, deep troubleshooting, and even product analytics. We allow you to easily aggregate logs and metrics by micro-service, calculate percentiles and moving window averages, forecast anomalies, and create interactive views for your whole organization. Try it for free, at any scale.

  • VoltDB is a full-featured fast data platform that has all of the data processing capabilities of Apache Storm and Spark Streaming, but adds a tightly coupled, blazing fast ACID-relational database, scalable ingestion with backpressure; all with the flexibility and interactivity of SQL queries. Learn more.

  • In a recent benchmark conducted on Google Compute Engine, Couchbase Server 3.0 outperformed Cassandra by 6x in resource efficiency and price/performance. The benchmark sustained over 1 million writes per second using only one-sixth as many nodes and one-third as many cores as Cassandra, resulting in 83% lower cost than Cassandra. Download Now.

  • Datadog is a monitoring service for scaling cloud infrastructures that bridges together data from servers, databases, apps and other tools. Datadog provides Dev and Ops teams with insights from their cloud environments that keep applications running smoothly. Datadog is available for a 14 day free trial at datadoghq.com.

  • Here's a little quiz for you: What do these companies all have in common? Symantec, RiteAid, CarMax, NASA, Comcast, Chevron, HSBC, Sauder Woodworking, Syracuse University, USDA, and many, many more? Maybe you guessed it? Yep! They are all customers who use and trust our software, PA Server Monitor, as their monitoring solution. Try it out for yourself and see why we’re trusted by so many. Click here for your free, 30-Day instant trial download!

  • Turn chaotic logs and metrics into actionable data. Scalyr replaces all your tools for monitoring and analyzing logs and system metrics. Imagine being able to pinpoint and resolve operations issues without juggling multiple tools and tabs. Get visibility into your production systems: log aggregation, server metrics, monitoring, intelligent alerting, dashboards, and more. Trusted by companies like Codecademy and InsideSales. Learn more and get started with an easy 2-minute setup. Or see how Scalyr is different if you're looking for a Splunk alternative or Loggly alternative.

  • SignalFx: just launched an advanced monitoring platform for modern applications that's already processing 10s of billions of data points per day. SignalFx lets you create custom analytics pipelines on metrics data collected from thousands or more sources to create meaningful aggregations--such as percentiles, moving averages and growth rates--within seconds of receiving data. Start a free 30-day trial!

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex goes beyond monitoring and measures the system's work on your MySQL and PostgreSQL servers, providing unparalleled insight and query-level analysis. This unique approach ultimately enables your team to work more effectively, ship more often, and delight more customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • aiScaler, aiProtect, aiMobile Application Delivery Controller with integrated Dynamic Site Acceleration, Denial of Service Protection and Mobile Content Management. Also available on Amazon Web Services. Free instant trial, 2 hours of FREE deployment support, no sign-up required. http://aiscaler.com

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jul202015

Algolia's Fury Road to a Worldwide API Steps Part 2


The most frequent questions we answer for developers and devops are about our architecture and how we achieve such high availability. Some of them are very skeptical about high availability with bare metal servers, while others are skeptical about how we distribute data worldwide. However, the question I prefer is “How is it possible for a startup to build an infrastructure like this”. It is true that our current architecture is impressive for a young company:

  • Our high-end dedicated machines are hosted in 13 worldwide regions with 25 data-centers

  • our master-master setup replicates our search engine on at least 3 different machines

  • we process over 6 billion queries per month

  • we receive and handle over 20 billion write operations per month

Just like Rome wasn't built in a day, our infrastructure wasn't as well. This series of posts will explore the 15 instrumental steps we took when building our infrastructure. I will even discuss our outages and bugs in order to you to understand how we used them to improve our architecture.

The first blog post of the series focused on the early days of our beta. This blog post will focus on the first 18 months of our service from September 2013 to December 2014 and even include our first outages!

Step 4: January 2014

Click to read more ...

Friday
Jul172015

Stuff The Internet Says On Scalability For July 17th, 2015

Hey, it's HighScalability time:


In case you were wondering, the world is weird. Large Hadron Collider discovers new pentaquark particle.

 

  • 3x: Uber bigger than taxi market; 250x: traffic in HotSchedules' DDoS attack; 92%: Apple’s share of the smartphone profit pie; 7: Airbnb rejections
  • Quotable Quotes:
    • Netflix: A slow or unhealthy server is worse than a down server 
    • @inconshreveable: ngrok production servers, max GC pause: Go1.4 (top) vs Go1.5. Holy 85% reduction! /cc Go team
    • Nic Fleming: The fungal internet exemplifies one of the great lessons of ecology: seemingly separate organisms are often connected, and may depend on each other.
    • @IBMResearch: With 20+ billion transistors on new chip, that's a 50% scaling improvement over today’s tech #ibmresearch #7nm 

  • Apple and Google Race to See Who Can Kill the App First. Honest question, how are people supposed to make money in this new world? Apps are quickly becoming just an identity that ties together 10 or so components that appear integrated as part of the OS, but don't look like your app at all. Reminds me of laminar flow. We are seeing a rebirth of CORBA, COM and OLE 2, this time the container is an app bound by deep linking and some ad hoc ways to push messages around. Show developers the money.

  • The dark side of Google 10x: One former exec told Business Insider that the gospel of 10x, which is promoted by top execs including CEO Larry Page, has two sides. “It’s enormously energizing on one side, but on the other it can be totally paralyzing,”

  • Wait, are we going all RAM or all flash? So confusing. MIT Develops Cheaper Supercomputer Clusters By Nixing Costly RAM In Favor Of Flash: researchers presented evidence at the International Symposium on Computer Architecture that if servers executing a distributed computation go to disk for data even just 5 percent of the time, performance takes a hit to where it's comparable with flash memory anyway. 40 servers with 10 terabytes of RAM wouldn't chew through a 10.5TB computation any better than 20 servers with 20TB of flash memory. What's involved here is moving a little computational power off of the servers and onto the chips that control the flash drives.

  • Is disruption merely a Silicon Valley fantasy? Corporate America Hasn’t Been Disrupted: the advantage enjoyed by incumbents, always substantial, has been growing in recent years...more Americans worked for big companies...Large companies are becoming more dominant in part by buying up their rivals...Consolidation could explain at least part of the rising failure rate among startups...The startup rate has declined in every major industry, every state and nearly every city, and the failure rate’s rise has been nearly as universal. 

  • What's a unikernel and why should you care? Amir Chaudhry reveals all in his Unikernels talk given at PolyConf 15. And here's the supporting blog post. Why are we still applications on top of operating systems? Most applications are single purpose so why all the complexity? Why are we building software for the cloud the same way we build it for desktops? We can do better with Unikerels where every application is a single purpose VM with a single address space.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Thursday
Jul162015

This. Just. This.

In response to an honest comment about some of Instagram's rather "ordinary engineering choices", mikeyk (Co-founder @ Instagram) had what I consider the perfect response
We (at IG) aren't claiming to be doing revolutionary things on infrastructure--but one thing I found super valuable when scaling Instagram in the early days was having access to stories from other companies on how they've scaled. That's the spirit in which I encourage our engineers to blog about our DB scaling, our search infra, etc--I think the more open we are (as a company, but more broadly as an industry) about technical approaches + solutions, the better off we'll be.
This could be the anthem for HS and is a key reason our industry continues to get better. And in case you are interested, here are just a few of those stories from Instagram:

On HackerNews

Wednesday
Jul152015

64 Network DO’s and DON’Ts for Game Engines. Part IIIa: Server-Side 

This article originally appeared on ITHare.com. It's one article from an excellent series of articles: Part I. Client Side; Part IIa. Protocols and APIs; Part IIb; Protocols and APIs; Part IIIb. Server-Side (deployment, optimizations, and testing); Part IV. Great TCP-vs-UDP Debate; Part V. UDP; Part VI. TCP.

In Part III of the article, we’ll discuss issues specific to server-side, as well as certain DO’s and DON’Ts related to system testing. Due to the size, part III has been split, and in this part IIIa we’ll concentrate on the issues related to Store-Process-and-Forward architecture.

18. DO consider Event-Driven programming model for Server Side too

As discussed above (see item #1 in Part I), the event-driven programming is a must for the client side; in addition, it also comes handy on the server side. Having multi-threaded logic is still a nightmare for the server-side [NoBugs2010], and keeping logic single-threaded simplifies development a lot. Whether to think that multi-threaded game logic is normal, and single-threaded logic is a big improvement, or to think that single-threaded game logic is normal, and multi-threaded logic is a nightmare – is up to you. What is clear is that if you can keep your game logic single-threaded – you’ll be very happy compared to the multi-threaded alternative.

However, unlike the client-side where performance and scalability rarely pose problems, on the server side where you need to serve hundreds of thousands of players, they become really (or, if your project is successful, “really really”) important. I know two ways of handling performance/scalability for games, while keeping logic single-threaded.

18a. Off-loading

Click to read more ...

Wednesday
Jul152015

A Very Old Version of the PlentyOfFish Architecture that was Just Sold for $575 Million

PlentyOfFish was acquired by the Match Group for $575 million in cash. And it all goes to Markus Frind. Here's the story of the acquisition

Way back in 2009 I wrote architecture article on PlentyOfFish, which I'll reproduce here for historical perspective. The main theme at that time was how Markus was making great fat stacks of cash from adsense by running this huge site all by himself on a Microsoft stack.

We know the adsense goldmine played out long ago. What else has changed? We don't really know. Sometime ago we stopped getting updates on PlentyOfFish architecture changes, so that's all we have.

I doubt much remains the same however. Now 75 people work at PlentyOfFish, there are 90 million registered users, and a whopping 3.6 million active daily users, so something must be happening.

Anyway, here's the old PlentyOfFish Architecture. It still makes for interesting reading. I'm just wondering, when you get done reading, is being sold for $575 Million the ending you would expect?

PlentyOfFish Architecture

Click to read more ...