hot links

Stuff The Internet Says On Scalability For August 1st, 2014

High Scalability

01 Aug 2014 — 5 min read

Hey, it's HighScalability time:

From Systems Performance: Enterprise and the Cloud.

Quotable Quotes:
- @shanselman: Wife: "How was your day?" Me: "I'm using Grunt to automate NuGet creation for AngularJS." Wife: "But will that scale?" Me: "Well played."
- John Hagel: winners in the concentrating parts of the economy are increasingly determined by the ability to connect with, and build strong relationships with, the participants who are operating in fragmenting parts of the economy.
- Jack Clark: This means that although Amazon still grew at a respectable rate, its actual revenues were clipped by the heightened competition. This is what happens when you sell goods with deflationary pricing, it seems.

Taxi app Hailo on Scaling micro-services Architecture on AWS: Micro-services + Containers + Scheduling on AWS will be a dominant architecture pattern in the next few years.

Netflix. Revisiting 1 Million Writes per second. How will Cassandra perform on AWS's new instance types? There's no big reveal so you'll have to decide for yourself. Good discussion on reddit and Hacker News.

TrueTime in Google's Spanner was one of its most buzzworthy innovations. Who doesn't like atomic clocks as a way to time-stamp transactions anywhere in the world? What if you don't have spare atomic clocks? Hybrid Logical Clocks: HLC captures the causality relationship like LC, and enables easy identification of consistent snapshots in distributed systems. Dually, HLC can be used in lieu of PT clocks since it maintains its logical clock to be always close to the PT clock.

Vertical integration for the win. Apple is building out their own CDN with many terabits of capacity. Capable of handling traffic bursts from software downloads. Or maybe something else? More from Dan Rayburn in Apple’s CDN Now Live: Has Paid Deals With ISPs, Massive Capacity In Place.

Clouds make a lot of money on their network pricing. Chris Swan explores this marketing magic in Cloud Price Wars – What about the network?: there haven’t been any major shifts in network pricing.

Scaling with Microservices and Vertical Decomposition: The architecture of otto.de is based on the concept of vertical decomposition: the whole system is vertically split into several loosely coupled applications. Every “vertical” is responsible for a single business domain such as “Order”, “Search & Navigation”, “Product”, etc. It has its own presentation layer, persistence layer and a separate database. From the development perspective, every vertical is implemented by exactly one team and no code is shared between the different systems.

Get something simple up and running. Useful description of a scalable 3-tier architecture. Horizontally Scaling Node.js and WebSockets with Redis. Load balancing: node-http-proxy. Messaging: redis. SockJS.

Excellent lessons learned implementing Redis master failover. Redis Sentinel at Flickr. Flickr's task system processes millions of tasks per day (photo uploads, user notifications and metadata edits), with no more than about 2 minutes a month downtime. Looks at Configuring, Interfacing, Testing Sentinel.

Isn't this just common cents? How Hackers Hid a Money-Mining Botnet in the Clouds of Amazon and Others: At the Black Hat conference in Las Vegas next month Ragan and Salazar plan to reveal how they built a botnet using only free trials and freemium accounts on online application-hosting services—the kind coders use for development and testing to avoid having to buy their own servers and storage. The hacker duo used an automated process to generate unique email addresses and sign up for those free accounts en masse, assembling a cloud-based botnet of around a thousand computers.

SendGrid has learned a few things in 5 years and 270 billion emails sent: getting a feature working is not enough, design to handle failures and to not impact customers; trust in your design, trust in your people, trust in your process, don't stress how much load your system is handling; email problems are more people problems than technology problems.

Great detail on how do you make writes highly available, fault tolerant, distributed, and redundant. Multi Data Center Replication in NoSQL Databases Explained: this page is meant to highlight the simple inner workings of how Cassandra excels in multi data center replication by simplifying the problem at a single-node level.

High performance SSDs: hot, hungry & sometimes slow: This appears to be the first in-depth analysis of the power, temperature and performance of a modern high-end SSD. The news should be cautionary for system architects. The slowdown seen for large writes suggests caution when configuring SSDs for write-intensive apps. Almost by definition the performance hit will come at the worst possible time.

Very detailed NGINX Tutorial: Developing Modules. Nginx is extensibile and here's how to do it. NGINX modules can be written in a number of languages. Includes lots of example code.

Why do you need fault-tolerant local state for stream processing? Why can't your data just sit in a remote database? Jay Kreps with an excellent explanation in Why local state is a fundamental primitive in stream processing. The core idea is "locality allows rich access."

ETS scalability/performance summary #erlang #euc2013: Use pinning on NUMA, Use read_concurrency when doing only lookups, Use write_concurrency, Measure your use case when combining them.

You are thinking of moving to RDS from MySQL. What can go wrong? Lots. What I learned while migrating a customer MySQL installation to Amazon RDS: The only way to interface with RDS is through mysql client; RDS is set to UTC (system_time_zone=UTC) and this cannot be changed...and many more things to think about.

Building things is always the hard part of the Internet of Things. Ryan Vinyard in OSCON 2014 Keynote: "Open Manufacturing..." takes how the masses can build things to the next level. A very interesting approach using open source hardware.

AlBlue with a great trip report of QCon London 2014 Day 2. Lots of detail.

Signals from OSCON 2014. A list of recommended videos from the conference.

Walmart loves running javascript on the client and server and they've open sources lazojs: A client-server web framework built on Node.js that allows front-end developers to easily create a 100% SEO compliant, component MVC structured web application with an optimized first page load.

A Look at Nanomsg and Scalability Protocols (Why ZeroMQ Shouldn't Be Your First Choice): That said, nanomsg's improvements and, in particular, its scalability protocols make it very appealing. A lot of the strange behaviors that ZeroMQ exposes have been resolved completely or at least mitigated.

MapGraph: Massively Parallel Graph processing on GPUs. The MapGraph API makes it easy to develop high performance graph analytics on GPUs. The API is based on the Gather-Apply-Scatter (GAS) model as used in GraphLab.

mangos: package mangos is an implementation in pure Go of the SP ("Scalable Protocols") protocols. This makes heavy use of go channels, internally, but it can operate on systems that lack support for cgo. It has no external dependencies. The reference implementation of the SP protocols is available as nanomsg.

ledisdb: a high performance NoSQL powered by golang.

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing: Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.

Stuff The Internet Says On Scalability For August 1st, 2014

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale