Stuff The Internet Says On Scalability For April 28th, 2017

Hey, it's HighScalability time:

Do you understand the power symbol? I always think of O as a circuit being open, or off, and the | as the circuit being closed, or on. Wrong! Really the symbols are binary, 0 for false, or off, 1 for true, or on. Mind blown.
If you like this sort of Stuff then please support me on Patreon.

  • 220,000-Core: largest Google Compute Engine job; 100 million: Netflix subscribers; 1.3M: Sling TV subscribers; 200: Downloadable Modern Art Books; 25%: Americans Won't Subscribe To Traditional Cable; 84%: image payload savings using smart CDN; 10^5: number of world-wide cloud data centers needed; 63%: more Facebook clicks using personality targeting; 2.5 million: red blood cells created per second; 

  • Quotable Quotes:
    • Silicon Valley~ The only reason Gilfoyle and I stayed up 48 f*cking straight hours was to decrease server load, not keep it the same. 
    • Robert Graham: In other words, if the entire Mirai botnet of 2.5 million IoT devices was furiously mining bitcoin, it's total earnings would be $0.25 (25 cents) per day.
    • @BoingBoing: John Deere just told US Copyright office that only corporations can own property, humans merely license it
    • mattbillenstein: Lin Clark's talk makes this sound like they implemented a scheduler in React -- basically JS is single-threaded, so they're implementing their own primitives and a scheduler for executing those on that main thread.
    • Robert M. Pirsig: When analytic thought, the knife, is applied to experience, something is always killed in the process.
    • @vornietom: I honestly feel bad for the people on the Placebo March who thought they were at the Science March but double blind testing is important
    • MIT: we can capture and monitor human breathing and heart rates by relying on wireless reflections off the human body.
    • Mohamed Zahran~ Surprisingly enough traditional homogenous multi-core are really heterogeneous. Why is that? Every core is running at its own frequency. Many processors are now a traditional core and a GPU. FPGAs are already with us. Automata Processor is a specialized processor that can execute non-deterministic finite automata (regular expressions) orders of magnitude faster than a GPU.  Neuromorphic brain inspired chips. Fancy GPUs. 
    • @craigbuj: amazing how fast China Internet companies can scale: ofo: 10+ million daily rides in China Uber: ~6 million daily rides globally
    • knz: CockroachDB's architecture is an emergent property of its source code. 
    • @Jason: Good news: over 70b spent on digital ads in 2016.  Terrifying news: 89% of growth was Facebook & Google. Via @iab
    • @swardley: I think we need to stop thinking about AMZN as a future $1T biz and more think about it as a future $10T biz, possibly much more.
    • @timoreilly: "Algorithms are opinions embedded in code." @mathbabedotorg #TED2017 
    • Google: I think we [Google Cloud] have a pretty good shot at being No. 1 in five years
    • limitless__: Folks who think programmer skill declines when you're 40+ are 100% wrong. What declines is your willingness to put up with stupidity and what increases is your willingness and ability to tell someone to fly a kite when they tell you to work stupid hours and do stupid things.
    • @nicusX: "Don't worry about X. X is transparently managed for you". Reads: "When things go wrong you'll never be able to fix it" #mechanicalSympathy
    • defined: What's up is the rampant ageism in the industry - the perception that you are washed up as a "dinosaur" developer after a certain age, maybe 40 or so, and belong in management. We "dinosaurs" - we happy few - are living evidence to the contrary.
    • user5994461: AWS Spot Instances are under bid. The highest bidder takes the instances, the price changes all the time. Google Spot Instances (preemptibles) are 80% off and that's it. It's simple.
    • James Hamilton: in 10 years, ML will be more than 1/2 the worlds server side footprint.
    • qnovo: if we examine the average capacity in smartphones over the past 5 years, we see that it has grown at about 8% annually. A battery in a 2017 smartphone contains about 40 – 50% more capacity (mAh) than it did in 2012.
    • StorageMojo: Bottom line: the NVRAM market is heating up. And that’s a very good thing for the IT industry.
    • Crazycontini: We need a lot more help to clean up the world’s crypto mess.
    • Pramati Muthalaxe: Irrespective of what Facebook says, all of them have one objective — to get more money out of potential advertisers. That requires a constant decay of your reach.
    • danluu: It looks like, for a particular cache size, the randomized algorithms do better when miss rates are relatively high and worse when miss rates are relatively low,
    • Freddie deBoer: Why is everything so expensive? Because Silicon Valley and Wall Street are taking huge percentages out of transactions they once didn’t. That’s why. 
    • eachro: I'd probably rank these areas in order of importance as follows: deep learning, pgms, reinforcement learning. Deep learning as a framework is pretty general. PGMs, as I have seen them, don't really have any one killer domain area - maybe robotics and areas where you want to explicitly model causality? Applications for reinforcement learning seem the more niche, but maybe that's because they haven't been adequately explored to the extent that DL/CNNs/RNNs/PGMs have been.
    • Ramon Leon: Nonsense; the idea behind an ORM is to eliminate repetitive hand written SQL and mapping code to simplify and speed up development. It isn’t and never was about hiding SQL from the developer.
    • Zombieball: Just to clarify, as others have pointed out, the leadership principles at Amazon are in no way a new experiment. I don't know their origins but I wouldn't be surprised to learn they are nearly as old as the company itself (probably a few years younger). They are deeply engrained in Amazon culture. They are brought up frequently. They are used as the basis for both hiring and promotion decisions.
    • Thomas Thwaites: It may take a village to raise a child, but it takes an entire civilization to build a toaster.
    • @BitIntegrity: OH: "So far, the Infinite monkey theorem is just giving us an infinite number of javascript frameworks, and no Shakespeare"
    • @postwait: When you say your system scales, but does so with tragically suboptimal economics, then it doesn't actually scale.
    • Billy Tallis: Optane Memory is Intel's latest attempt at an old idea that is great in theory but has struggled to catch on in practice: SSD caching.
    • @dberkholz: Perhaps the biggest change in tech in the past 20 years is its second derivative — the increasing rate of change. Adapt or die.
    • Samuel Greengard: An intriguing aspect of emerging chip designs for AI, deep learning, and machine learning is the fact that low-precision chip designs increasingly prevail. In many cases, reduced-precision processors conform better to neuromorphic compute platforms and accelerate the deployment and possibly training of deep learning algorithms.
    • Henri Bergius: when most people think about microservices, they think systems that communicate with each other using HTTP APIs. I think this is quite limited, and something that makes microservices a lot more fragile than they could be. Message queues provide a much better solution.
    • Ben Einstein: Our usual advice to hardware founders is to focus on getting a product to market to test the core assumptions on actual target customers, and then iterate. Instead, Juicero spent $120M over two years to build a complex supply chain and perfectly engineered product that is too expensive for their target demographic.
    • Hedda Hassel Mørch: But in fact physical matter (at least the aspect that physics tells us about) is more like software: a logical and mathematical structure. According to the hard problem of matter, this software needs some hardware to implement it. Physicists have brilliantly reverse-engineered the algorithms—or the source code—of the universe, but left out their concrete implementation.
    • David Weinberger: We thought knowledge was about finding the order hidden in the chaos. We thought it was about simplifying the world. It looks like we were wrong. Knowing the world may require giving up on understanding it.
    • Christopher Domas: A processor is not a trusted black box for running code; on the contrary, modern x86 chips are packed full of secret instructions and hardware bugs. In this talk, we'll demonstrate how page fault analysis and some creative processor fuzzing can be used to exhaustively search the x86 instruction set and uncover the secrets buried in your chipset.

  • Is Kubernetes the next OpenStack? The Cloudcast #296. No. The core architecture team for Kubernetes ensures there's a consistency accross the project. There's more cohesiveness in thinking of Kubernetes as one architecture. OpenStack was made up of different projects that didn't talk to other project's APIs. OpenStack had diverging goals. There was a lot of infighting over the vision of what OpenStack was supposed to be. Kubernetes has a more targeted mandate. Some thought they were replacing VMWare, others thought they were replacing AWS. Kubernetes is a container platform that will run in a bunch of different clouds and over time features will be added to support a broader set of application patterns. Another difference is big cloud providers like Google, Microsoft, Digital Ocean, and others run Kubernetes in production all the time. For a long time Rackspace was the only one to run OpenStack in production. 

  • Perhaps Apple's new search ads system needs to funnel in more advertisers? Apple cuts affiliate app commission rates by nearly 65 percent, really hurting the little guy trying to eek out a living with affiliate revenue. YouTube is also decreasing monetization opportunities for the little guy. For an highly educational discussion on how to have a diverse income stream for content creators see Jack Spirko's Why YouTube Creators Should Consider Patreon and Why Most are Broke, he's one guy who has this how to make money off of content thing down cold. Highly recommended.

  • Nice gloss by Wissam Abirached on How Facebook Live Scales. The thundering herd problem of many simultaneous live viewers--millions of simultaneous streams, millions of users on the same stream--is handled by a global network of Edge Cache servers, request coalescing, and load balancing across Edge Cache servers. Request coalescing is when multiple requests for the same packet hit the Edge Cache, they are grouped together in a request queue and only one goes through to the Origin Server.

  • Going multi-cloud with Google Cloud Endpoints and AWS Lambda. Google's going for embrace and replace. Just give us a try AWS developers they say, on just one little part of your project, you'll like us. We have some good stuff. We know you won't move everything over, just give us part of your workload. And don't worry about those egress costs.

  • Will Oracle's cloud strategy of scaling-up instead of scaling-out succeed? James Hamilton on How Many Data Centers Needed World-Wide: I don’t believe that Oracle has, or will ever get, servers 2x faster than the big three cloud providers. I also would argue that “speeding up the database” isn’t something Oracle is uniquely positioned to offer. All major cloud providers have deep database investments but, ignoring that, extraordinary database performance won’t change most of the factors that force successful cloud providers to offer a large multi-national data center footprint to serve the world.

  • Amazon's Spectrum let's you run SQL queries against data stored in S3. What are the implications?: is S3 the new HDFS?; Disaggregation of storage and compute; in time you'll be able to run SQL queries over your entire AWS data storage stack; save tons of time and mental energy just getting data in the right place; data lakes will start looking much more like true lakes rather than a series of ponds connected periodically via ETL. 

  • Examining 3D XPoint’s 1,000 Times Endurance Benefit: What I find interesting in this exercise is that the cell lifetime calculated using this approach is not significantly different between the NAND flash SSDs and the 3D XPoint Optane SSDs.  On average the Optane SSDs have better numbers, but the 32,850 W/E figure for the Optane DC P4800X is only marginally superior to the 31,025 W/E figure for the NAND-based DC P3700.  The 32GB Optane Memory client SSD offers endurance that is ten times that of the NAND-based client SSDs (on the top two lines).

  • Infor tests about 50 multi-tenant apps on AWS for under $1 per month using SWF, Lambda & DynamoDB. A lambda functions runs every 2 minutes using Cloud Watch events. Workflows are retrieved from the database and set up in SWF. Every 2 minutes the workflows are queried to see what the next step to run is. Biggest benefit of this architecture is the low cost. Also good: iRobot: Vacuuming Up Microservices on AWS. The idea is a robot company using a serverless architecture can manage millions of robots without worrying about infrastructure. 

  • Here's why Wayfair ditched their ORM. All the usual, ORMs are great when queries are simple, but as they get more complex they hurt more than they help. Now Wayfair directly dispatches SQL queries using SQLAlchemy Core.

  • Walmart on The Benefits of Server Side Rendering Over Client Side Rendering: The main difference is that for SSR your server’s response to the browser is the HTML of your page that is ready to be rendered, while for CSR the browser gets a pretty empty document with links to your javascript. That means your browser will start rendering the HTML from your server without having to wait for all the JavaScript to be downloaded and executed...our numbers showed better engagement from the customer with rendering early.

  • Videos from Facebook's F8 Conference are now available

  • Lessons learned from five years with Node.js: Be careful with JavaScript math operations; Get into the habit of calling callbacks as the last statement in your functions (this one bites me all the time); Think really hard about what you’re asking your Node.js server to do. Synchronous work is really bad news; Make your dependencies explicit, unless it’s absolutely necessary to do otherwise; think about data versioning from the start;  Node.js is all on one thread, Use cluster, but pay close attention - it’s not a magical solution; and many more.

  • Cloud computing competition heats up. Tencent, a $233 billion dollar company, is opening up datacenters all over the world, including in the US. Tencent Opens Cloud Data Center in Silicon Valley

  • Tools are always where a lot of the most interesting work is done. Here's How Yelp Runs [20] Millions of Tests Every Day: First, the developer triggers a seagull-run from the console. This starts a Jenkins job to build code artifact and generate a test list. Tests are then grouped together and passed to a scheduler to execute the tests on the Seagull cluster. Finally, test results are stored in Elasticsearch and S3. Tests are bundled using a Greedy Algorithm or using  Linear Programming. For each bundle the scheduler creates one mesos executor and schedules it on the Seagull cluster whenever sufficient resources are offered by the Mesos master. They launch more than 2 million Docker containers in a day. To handle this, we need to have around 10,000 CPU cores in our seagull cluster during peak hours. To reduce costs, we started using an internal tool, called FleetMiser. FleerMiser saved us ~80% in cluster cost. Before FleetMiser, the cluster was completely on AWS On-Demand Instances with no auto scaling.

  • Here's how Spotify processes 100 billion events a day. Reliable export of Cloud Pub/Sub streams to Cloud Storage. The data is used to calculate royalties, make recommendations, and so on. 

  • Better Than Linear Scaling: I think databases systems no longer should be seen as single server systems...When we add additional nodes to the deployment, effectively we increase not only CPU cores, but also the memory that comes with the node...the effect of extra memory could be non-linear (and actually better than linear)...we can achieve better-than-linear scaling in a sharded setup...With five nodes, the improvement is 13.81 times compared to the single node.

  • Yep, when you use your database as your application container you need bigger iron until budgets bust and there's no iron big enough. Using More Than a Hundred Stored Procedures in Production: As we use hundreds of stored procedures and functions, it is highly demanding on resources. Our database server has to be powerful. We’re forced to use dedicated hardware. In fact we tried using AWS RDS’s db.m4.4xlarge which is a 16-core 64GB RAM virtual machine with the max IOPS allocated. It didn’t work.

  • Unroll.me. Your daily reminder that the end of state of free services is selling your data. We know anonymized data is not really all that anonymous. And don't you get the feeling that beyond the red velvet ropes is a VIP section where raw data is sold like crack on the street corner?

  • Interesting commentary to Writing a Time Series Database from Scratch. You can go a long ways without experience, but experience always tells.

  • Fascinating look into a state-of-the-art distributed database: The SQL layer in CockroachDB. A lot is going on, but then again there always is a lot going on. Five main component groups are described: pgwire: the protocol translator between clients and the executor; the SQL front-end; the SQL middle-end; the SQL back-end; the executor, which coordinates between the previous four things and the session object. 

  • Everything is a little better with Bacon. Roger Bacon, that is, 13th-century English philosopher. Clearly Bacon didn't know about distributed tracing or he would understand how do you can distinguish from an identical effect what is the proper cause. Interesting how in that time with Aristotle finally making it to the West, thinkers woke to the power of observation and the direct experience of the natural world as a source of knowledge, that now we finding knowledge in faith: The Dark Secret at the Heart of AI, Computer Generated Math Proofs, Evolutionary Engineering. I'm certain he would be perplexed at how many people really don't want to correct errors in their thinking.

  • "Caching and RAM are the answer to everything," says Flicker. Client, Network, Server and Application Caching on the Web: A Website with the HTTP headers wisely defined will provide a better experience for the users; The Cache-control: public HTTP header directive allows different parts of the Network to cache a response; set up a cache server between the application and the client; A global code memoization is going to last in-memory during all the application execution cycle; share a data cache between application instances.

  • Seems like a useful tutorial. Serverless Stack is a free comprehensive guide to creating full-stack serverless applications. Create a note taking app from scratch using React.js, AWS Lambda, API Gateway, DynamoDB, and Cognito. 

  • What's Amazon up to? All the Things. Amazon Strategy Teardown: Amazon’s latest raft of acquisitions could indicate more hunger; Amazon’s next pillar is likely to be AI; Amazon’s interest in GRAIL may foreshadow healthcare AI interest; Amazon’s corporate venture arm, the Alexa Fund, has nurtured the developer and hardware ecosystem around Alexa as a universal AI assistant; The company is also making more diversified investments into logistics, cloud apps, and media; Secretive R&D skunk work Lab126 is behind Amazon’s recent consumer tech hits; Next-generation logistics is a centerpiece of Amazon’s R&D; Amazon has also raised its profile in consumer goods and physical retail. 

  • Scalable, Lie-Detecting Timeserving with Roughtime: To scale the signing workload, Roughtime uses a Merkle Tree to sign a batch of client requests with a single signature operation. The root of the tree is signed and included in all responses. With a batch size6 of 64 a single 3.0 GHz Skylake core can sign 3.9 million requests per second.

  • Lessons learned from moving my side project to Kubernetes to run a small study site for tournament Scrabble players: Anyway, now that I finally have a zero-downtime deploy story, may I never have to touch this setup again, and I can start working on adding multiplayer mode. Yay! And in the end it was worth it; deploying a new container is now as simple as pushing to master.

  • Silicon Valley security robot beat up in parking lot. Could this be Dr. Who trying to prevent the development of Cyberman?

  • Funcatron: let’s you deploy serverless on any cloud provider or in your private cloud. Focus on the functions, avoid vendor lock-in.

  • ScanNet: video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations, includes thousands of scenes with millions of annotated objects like coffee tables, couches, lamps, and TVs.

  • Book review: "Fooled by Randomness" and "The Black Swan": The biggest surprise for me personally from these books is that one of the most feared category of bugs, race conditions, are not black-swan bugs, but are instead white-swan bugs. They are quite random, and very amenable to the Gaussian statistical tools that Taleb so rightly denigrates for black-swan situations. You can even do finite amounts of testing and derive good confidence bounds for the reliability of your software—but only with respect to white-swan bugs such as race conditions. So I once again feel lucky to have the privilege of working primarily on race conditions in concurrent code!

  • Research at Google and ICLR 2017. Lots of different papers from the 5th International Conference on Learning Representations.

  • Stanford Lecture Notes on Probabilistic Graphical Models: These notes form a concise introductory course on probabilistic graphical modelsProbabilistic graphical models are a subfield of machine learning that studies how to describe and reason about the world in terms of probabilities. They are based on Stanford CS228, taught by Stefano Ermon, and are written by Volodymyr Kuleshov, with the help of many students and course staff.

  • Adversarial examples in the physical world: This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from a cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera. Also Scientists Can Blind A Self-Driving Car From Seeing Pedestrians.

  • Enabling Wide-spread Communications on Optical Fabric with MegaSwitch (article): In this paper, we seek an optical interconnect that can enable unconstrained communications within a computing cluster of thousands of servers. We present MegaSwitch, a multi-fiber ring optical fabric that exploits space division multiplexing across multiple fibers to deliver rearrangeably non-blocking communications to 30+ racks and 6000+ servers.