Stuff The Internet Says On Scalability For May 15th, 2015

Hey, it's HighScalability time:


Stand a top a volcano and survey the universe.  (By Shane Black & Judy Schmidt)

  • 1 million: Airbnb's room inventory; 2 billion: Telegram messages sent daily; Two billion: photos shared daily on Facebook; 10,000: sensors in every Airbus wing
  • Quotable Quotes:
    • Silicon Valley: “We’re about shaving yoctoseconds off latency for every layer in the stack,” he said. “If we rent from a public cloud, we’re using servers that are, by definition, generic and unpredictable.”
    • @liviutudor: Netflix: approx 250 Cassandra clusters over 7,000+ server instances #cloud
    • @GreylockVC: "More billion-dollar marketplaces will be created in the next five years than in the previous 20." - @simonrothman 
    • CDIXON: Exponential growth curves in the “feels gradual” phase are deceptive. There are many things happening today in technology that feel gradual and disappointing but will soon feel sudden and amazing.
    • @badnima: OH: "The gossip protocol has reached its scaling limits"
    • marcosdumay: People get pretty excited every time physicists talk about information. The bottom line is that information manipulation is just Math, viewed by a different angle.
    • Bill Janeway: There's only one way to hedge against uncertainty in venture capital...cash and control. Enough cash that when something goes wrong you can buy time to figure out what is and assess what you can do about it. 
    • zylo4747's coworker: Where's the step about preparing to have all your plans crushed and rushing shit out the door as fast as possible?
    • Martin Fowler: don't even consider microservices unless you have a system that's too complex to manage as a monolith. 
    • @postwait: Ingesting, querying, & visualizing data isn't a monitoring system. It isn't even sufficient plumbing for such a system. #srecon15europe
    • @techsummitpr: "Up to date weather conditions? It's not a marvel from Google, it's a marvel from the National Weather Service." @timoreilly #techsummitpr
    • @sovereignfund: Verified as legit: The top 25 hedge fund managers earn more than all kindergarten teachers in U.S. combined. 
    • Adrian Colyer: In their evaluation, the authors found that mixing MapReduce and memcached traffic in the same network extended memcached latency in the tail by 85x compared to an interference free network. 
    • @BenedictEvans: US ecommerce revenues 1999: $12bn 2013: $219bn
    • Gregory Hickok: the brain samples the world in rhythmic pulses, perhaps even discrete time chunks, much like the individual frames of a movie. From the brain’s perspective, experience is not continuous but quantized.
    • David Bollier: There is no master inventory of commons. They can arise whenever a community decides it wishes to manage a resource in a collective manner, with a special regard for equitable access, use and sustainability.

  • What’s Next for Moore’s Law?: I predict that Intel's 10nm process technology will use Quantum Well FETs (QWFETs) with a 3D fin geometry, InGaAs for the NFET channel, and strained Germanium for the PFET channel, enabling lower voltage and more energy efficient transistors in 2016, and the rest of the industry will follow suit at the 7nm node.

  • Don't read How to Build a Unicorn From Scratch – and Walk Away with Nothing if you are easily frightened. Years of work down the drain. **chills** To walk safely through the Valley: Focus on terms, not just valuation; Build a waterfall; Don’t do bad business deals just to get investment capital; Understand the motivations of others; Understand your own motivation.

  • How do you build a real-time chat system? Scaling Secret: Real-time Chat. Goal was to handle 50,000 simultaneous conversations. Pusher was used to deliver messages. For a database Secret used Google App Engine’s High-Replication Datastore. Some nice details on the schema and other issues. Good thread on HN where the main point of contention is should an expensive service like Pusher be used to do something so simple? Usual arguments about wasting money vs displaying your hacker plumage. 

  • Under the hood: Facebook’s cold storage system. A top to bottom reengineering to save power for infrequently accessed photos. Yes, that's cool. Each cold storage datacenter uses 1/6th the energy as a normal datacenter while storing hundreds of petabytes of data. Erasure coding is used to store data. Data is scanned every 30 days to recreate any lost data.  As capacity is added data is rebalanced to the new racks. No file system is used at all. 

  • Architecting Websites For The HTTP/2 Era. HTTP/2 is entirely binary whereas HTTP/1.* was entirely text. A typical switch to binary for performance. There will be less domain sharding, less concatenation, looks like HTTPS will be required, compression is discouraged, and server side push is fast. 

  • A while ago Flickr talked about using deep networks to recognize stuff in images. Now we get to try it out. Go to You/Camera Roll Beta/Magic View. You'll then see all your photo categorized into bins like animal:bird/animal:dog/architecture:bridge/food:beverage and so on. And it does a pretty good job too. Impressive. Someday maybe we can get rid of tags! It does seem to think my little border collie is an animal:cat and I can't imagine she's thrilled about that.

  • This is why we have art, poetry, music, writing, and movies. Object recognition for free: “Our visual world is much richer than the number of words that we have to describe it,” says Alexei Efros, an associate professor of computer science at the University of California at Berkeley. “One of the problems with object recognition and object detection — in my view, at least — is that you only recognize the things that you have words for. But there are a lot of things that are very much visual, but maybe there aren’t easy describable words for them. 

  • The more things change...Goodbye, SaaS — hello, Containers-as-a-Service: When selling Salesforce to a mid to large organization, Salesforce expects multi-year contracts with pre-negotiated user counts, exactly like the on-premise predecessors it ridiculed during its early days. The whole idea of “pay for what you use” has been subsumed by the realities of the sweet cash flow dynamics of a traditional enterprise sale, which ends up as shelfware when customers over-provision.

  • Videos from RIPE 70 are available. RIPE is is a five-day event where Internet service providers and network operators gather. Interesting talks on IPv6, SDN, DNS, and other inscrutable topics.

  • Colocation Refuses to Go Away: most colocation (in the retail sense) revenues (60 percent of such) do not come from enterprises (which make up for 40 percent), but from service providers (cloud, IT, telcos, etc.).

  • Fascinating Disassembling the Dash on Bits of Cents. Dash is Amazon's "buy me now button" that turns the real world into yet another opportunity for a transactional (as in commerce) interaction. It's much thicker than you would expect, probably because it uses a long lasting AAA battery. Good point in the comments that Dash may make more sense in a business setting than in a home.

  • Hey, batching still works. Performance doubling with message coalescing.

  • Interesting discussion on how to teach the idea of scaling, ratios, and relationships to kids. Scaling. It's not something typically taught directly, but should be: I think that this is where the Wall might be: when students need to shift from “getting an answer” to understanding relationships. The calculations mind-set is so deeply hard-wired into the brain that this shift never happens. Includes exercises with paper, pattern blocks, and Cuisenaire rods.

  • Impressive work. The Discovery of Apache ZooKeeper’s Poison Packet. The Internet community at its best.

  • It's the dream of most every programmer to create a tools company. Jay Kreps is living the dream. In a proud open source tradition Jay is taking his baby, Kafka, a most excellent publish-subscribe messaging backbone he developed while at LinkedIn, and is making a business of it. His new company is Confluent. Here's a great article on his story and what he's trying to accomplish now: From scaling LinkedIn to selling a nervous system for enterprise data

  • Scaling Bitcoin to Billions of Transactions Per Day (paper): A decentralized system is proposed whereby transactions are sent over a network of micropayment channels (a.k.a. payment channels or transaction channels) whose transfer of value occurs off-blockchain.

  • Cloud Foundry Summit 2015 videos are now available. Lots and lots of good information.

  • While this may never happen--Please stop calling databases CP or AP--the explanation of linearizability and availability are top notch, as is the explanation of why CAP isn't that useful a distinction. Though I am left wondering how we should talk about these issues, and we do need to be able to do that.

  • Extrapolating Data with Day-of-Week Effects. Nice explanation of queries for week totals, average weekly growth, weekly predictions, day-of-week trends, and daily predictions.

  • StorageMojo with a round up of papers from FAST ’15: StorageMojo’s Best Paper: CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems, Analysis of the ECMWF Storage Landscape, Reliable, Consistent, and Efficient Data Sync for Mobile Apps, Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage and Network-bandwidth.

  • There's no changing the future. Immutable Infrastructure Is The Future: First, make sure all data on your system can be externalized - use RDS or Cassandra.  Keep your logs off system.  Use something like logstash or Splunk.  Monitor externally.   Get to rolling updates with your classic automation.  Now just swap your classic automation out for an image build, that reproduces the configuration of your classic automation. You’re done.

  • Spot pricing savings can be dramatic: For just $.04 per week (4 cents) I would have been well above the price required to hold onto machine instances [c4.xlarge] 95% percent of the time representing of savings of over 80%

  • Don't trust searching text to the database is common wisdom, but sometimes good enough is good enough: Full text search in milliseconds with PostgreSQL.

  • Are you mortal? High Availability for Mere Mortals: Start simple; Split into Tiers; Introduce Redundancy; Expand Redundancy to Other Tiers; Data Center Redundancy.

  • Networking is the current system bottleneck. Maybe not so in the future. IBM demos first fully integrated monolithic silicon photonics chip: IBM says that, in theory, its technology could allow for chips with up to eight channels. 800Gbps from a single optical transceiver would be pretty impressive.

  • Introduction to Azure DocumentDB: DocumentDB is a new “Big Data” database engine created by Microsoft and managed as a service within their Azure Cloud framework.

  • Crickets Going Quiet: Questions of Evolution and Scale: success for us isn’t just one scaled solution, but multiple prototypes that local teams take forward. By identifying those local teams and empowering them to refine and move forward with their ideas, we’re hoping to create a school of fish not a whale—quick to move, and adaptable to rapidly changing and sometime volatile socio-political circumstances.

  • A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs:  In this work we detail a scalable fully pipelined FPGA accelerator that performs LZ77 compression and static Huffman encoding at rates up to 5.6 GB/s.

  • caplogic/MappedBus: a Java based low latency, high throughput message bus, built on top of a memory mapped file.

  • uber/ringpop: Scalable, fault-tolerant application-layer sharding. Good talk by Jeff Wolski.