Stuff The Internet Says On Scalability For May 1st, 2015

Hey, it's HighScalability time:


Got containers? Gorgeous shot of the CSCL Globe (by Walter Scriptunas II), world's largest container ship: 1,313ft long; 19,000 standard containers.

  • $3000: Tesla's new 7kWh daily cycle battery.
  • Quotable Quotes:
    • @mamund: "Turns out there is nothing about HTTP that I like" --  Douglas Crockford 
    • @PeterChch: Your little unimportant site might be hacked not for your data but for your aws resources. E.g. bitcoin mining.
    • @Joseph_DeSimone: I find it stunning that Google's annual R&D budget totaled $9.8 billion and the Budget for the National Science Foundation was $7.3 billion
    • @jedberg: The new EC2 container service adds the missing granularity to #ec2
    • Randy Shoup: “Every service at Google is either deprecated or not ready yet.”  -- Google engineering proverb
    • @mtnygard: Today the ratio of admins to servers in a well-behaved scalable web companies is about 1 to 10,000. @botchagalupe #craftconf
    • @joshk: Data: There Are Over 9x More Private IPOs Than Actual Tech IPOs 
    • @nwjsmith: “Systems are not algorithms. Systems are much more complex.“ #CraftConf @skamille
    • kk: “Because the center of the universe is wherever there is the least resistance to new ideas.”
    • John Allspaw: Stop thinking that you’re trying to solve a troubleshooting problem; you’re not. Instead of telling me about how your software will solve problems, show me that you’re trying to build a product that is going to join my team as an awesome team member, because I’m going to think about using/buying your service in the same way that I think about hiring.
    • @mpaluchowski: "Netflix is a #logging system that happens to play movies." #CraftConf
    • John Wilke:  Resiliency is more important than performance.
    • @peakscale: The server/cattle metaphor rubs me the wrong way... all the farmers I knew and worked for named and cared about their herd.
    • @aphyr: "We've managed to run 40 services in prod for three years without needing to introduce a consensus system" @skamille, #CraftConf
    • @ryantomlinson: “Spotify have been using DNS for service discovery for a long time” #CraftConf
    • @csanchez: Google "we start over 2 billion containers per week" containers, containers, containers! #qconlondon 
    • @tyler_treat: If you're using RabbitMQ, consider replacing it with Kafka. Higher throughput, better replication, replayability. Same goes for other MQs.
    • @tastapod: @botchagalupe telling #CraftConf how it is! “Yelp is spinning up 8 containers a second. This is the real sh*t, man!”
    • @mpaluchowski: "A static #alert threshold won't be any good next week. It must be calculated." #CraftConf
    • @mtnygard: #craftconf @randyshoup “Microservices are an answer to a scaling problem, not a business problem.”  So right.
    • @adrianco: @mtnygard @randyshoup speed of development is the business problem that leads to Microservices.
    • @b6n: the aws financials should be a wake-up call to anyone still thinking cloud isn't a game of raw scale
    • @mtnygard: The “edge” used to be top-of-rack. Then the hypervisor. Now it’s the container. That’s 100x the number of IPs. — @botchagalupe #craftconf
    • @idajantis: 'An escalator can never break; it can only become stairs' - nice one by @viktorklang at #CraftConf on Distributed Systems failing gracefully
    • @jessitron: "You should store your data in a real database and replicate it to Elasticsearch." @aphyr #CraftConf

  • A telling difference between Google and Apple: Google Now becomes a more robust platform with 70 new partner apps. Apple takes an app-centric view of the world and Google not surprisingly takes a data centric view. With Google developers feed Google data for Google to display. With Apple developers feed Apple apps for users to consume. On Apple developers push their own brand and control functionality through bundled extensions, but Google will have the perspective to really let their deep learning prowess sing. So there's a real choice.

  • How appropriate that game theory is applied to cyberwarfare. Mutually Assured Destruction isn't just for nukes. Pentagon Announces New Strategy for Cyberwarfare: “Deterrence is partially a function of perception,” the new strategy says. “It works by convincing a potential adversary that it will suffer unacceptable costs if it conducts an attack on the United States, and by decreasing the likelihood that a potential adversary’s attack will succeed.

  • Reducing big data using ideas from quantum theory makes it easier to interpret. So maybe QM is nature's way of making sense of the BigData that is the Universe?

  • Synergy is not always BS. Cheaper bandwidth or bust: How Google saved YouTube: YouTube was burning through $2 million a month in bandwidth costs before the acquisition. What few knew at the time was that Google was a pioneer in data center technology, which allowed it to dramatically lower the costs of running YouTube.

  • In a winner take all market is the cost of customer acquisition pyrrhic? Uber Burning $750 Million in a Year.

  • The cloud behind the cloud. Apple details how it rebuilt Siri on Mesos: Apple’s custom Mesos scheduler is called J.A.R.V.I.S.; Apple uses J.A.R.V.I.S. as its internal platform-as-a-service; Apple’s Mesos cluster spans thousands of nodes and runs about a hundred services; Siri’s Mesos backend represents its third generation, and a move away from “traditional” infrastructure.

  • Or why you shouldn't count on COUNT DISTINCT. How We Boosted Counting Performance by 7410x with Redis: HyperLogLog only gives an estimate, so then it was time to compare the estimates to the actual MySQL counts. For most queries, the difference was much smaller than the 0.8% error deviation (the smallest was 0.03%), but after benchmarking many different queries, we also had 2 that reached an error of 1.1% and 1.7%.

  • 600k concurrent websocket connections on AWS using Node.js. Technologies used: Websockets/ws; Sticky-session; M3.xlarge. And lots of specific tuning tips were given for node.js and EC2. Though it wasn't clear how much work those connections were doing.

  • What AWS Revenues Mean for Public Cloud and OpenStack More Generally: What does it mean now that we can all agree that AWS has built something fundamentally new?  A single business comparable to all the rest of the U.S. hosting market combined?  A business focused almost exclusively on net new “platform 3” applications that is growing at an unprecedented pace? It means we need to get serious about public and hybrid cloud. It means that OpenStack needs to view AWS as a partner and that we need to get serious about the AWS APIs.  It means we should also be looking closely at the Azure.

  • Craft Conf videos are now available. And if you would like a more vicarious experience, here are a few conference reports: Craft Conf 2015 - my subjective reportCraft Conference 2015 Recap; Craft Conf 2015, Day 3.

  • 5 common optimizations: Gzip assets; Make fewer HTTP requests; Add expires headers; Use a CDN; Optimized Images.

  • The Least Resistance to New Ideas: Many years ago the San Francisco Chronicle published a short column in which the writer mentioned that he had been traveling in India, and when he told the clerk at his hotel in New Delhi that he was from the San Francisco Bay Area the clerk responded, “Oh that is the center of the universe” Um, mumbled the traveller, and why do you say that? “Because the center of the universe is wherever there is the least resistance to new ideas.”

  • Many interesting papers at PaPoC '15- Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data. You may like A study of CRDTs that do computations or Designing a causally consistent protocol for geo-distributed partial replication

  • Nice Cloud Native Applications (for Dummies): the strong separation between capacity and state is one of those powerful cloud mantras that happen to drive the majority of the advantages (and challenges?) compared to traditional IT.

  • Excellent look at How do you live stream the FIFA 2014 World Cup? 500,000 simultaneous users for a single game is no small accomplishment. To complex to quickly summarize, but the technologies involved were Real Time Messaging Protocol (RTMP), HTTP Live Streaming, Cassandra, Python, Nginx+Lua, Clappr, logstash, elastic search, graphite, graphana, kibana, seyren, angular, mongo, redis, rails. A good tip: limit your queries + denormalize your data + send instrumentation data to graphite + use SSD.

  • If you like Redis you may like Disque, a new creation from antirez. Redis is often used a job queue, Disque does that, but moved  "into an ad-hoc, self-contained, scalable, and fault tolerant design, with simple to understand properties and guarantees, but still resembling Redis in terms of simplicity, performances, and implementation as a C non-blocking networked server." Good discussion of why you want to use a message queue on HN.

  • Endless amusements exploiting race conditions. And a little cash too. Race conditions on Facebook, DigitalOcean and others (fixed): Using race conditions you could rate a page multiple times, then delete one of your reviews, and then rate again. This allowed me to inflate or deflate ratings of any page; The principle behind this bug is same as previous one: send as many requests to an endpoint with a list of wanted usernames; reused one promo code multiple times using race conditions.

  • The IoT requires a cheap M2M network. The Weightless Standard standard looks promising. 5km range at $2 a chip on available spectrum. The ability to communicate over long ranges brings the cost and complexity of the system down.

  • New trends in Modern Extreme Programming: Pair Programming becomes Mob Programming; Continuous Integration becomes Continuous Deployment; Collective Code-Ownership becomes Collective Product-Ownership; Products not Projects; Hypotheses as well as Stories; Test-First Programming becomes Monitoring-First Programming; Continuous Learning.

  • Do we ever use rate encoding? How the Brain Transforms Sound: In the auditory cortex, however, about half the neurons use rate coding, which instead conveys the structure of the sound through the density and rate of the neurons’ spiking, rather than the exact timing.

  • You Thought Amazon's Cloud Was Big? Alibaba's Is Huge. Alibaba has 1.4 million customers versus 1 million for Amazon, 1,200 employees, and also builds their own custom hardware. But when I searched for their cloud services, you know, to figure out what services it offered, I saw a lot of PR, but I didn't see anything like typical cloud documentation, pricing, APIs etc. 

  • Here's how Twitter builds their new trends experience: The new system consists of two major components. The first component is trends detection. It is built on top of Summingbird, responsible for processing Firehose data, detecting anomalies and surfacing trends candidates. The other component is trends postprocessing, which selects the best trends and decorates them with relevant context data.

  • InMobi runs a 14TB data warehouse on PostgreSQL. On a single database server.

  • This is what happens when you use decaffeinated upgrade software. Starbucks went down because "The main POS table was deleted." But really, this stuff happens, so shouldn't stores be more resilient by design? 

  • The Hidden Costs of Microservices. They include increased maintenance at the public API; greater complexity because it's a distributed system; no transactions between services; no inherent ordering of events between services; no foreign keys; querying sucks.

  • RocksDB on Steroids, or Scaling Concurrent Log-Structured Key-Value Stores: The effort paid off! We discovered cLSM [concurrent LSM] outperforms the state-of-the-art LSM implementations (including RocksDB and LevelDB), improving throughput by 1.5x to 2.5x. It demonstrates superior scalability with the number of cores, successfully exploiting twice as many as RocksDB could scale up to previously. 

  • ept/hermitage: What are the differences between the transaction isolation levels in databases? This is a suite of test cases which differentiate isolation levels.

  • Bottled Water: lets you transform your PostgreSQL database into a stream of structured Kafka events. This is tremendously useful for data integration.

  • liblfds: a portable, license-free, lock-free data structure library written in C.

  • Circuit: Self-managed infrastructure, programmatic monitoring and orchestration.

  • Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures: The search operation should not involve any stores, waiting, or retries; The parse phase of an update operation should not perform any stores other than for cleaning-up purposes and should not involve waiting, or retries.; An update operation whose parse phase is unsuccessful should not perform any stores, besides those used for cleaning up the parse phase.; The number and region of memory stores in a successful update should be close to those of a standard sequential implementation.