Stuff The Internet Says On Scalability For April 1st, 2016

Hey, this is no joke, it's HighScalability time:


A glorious battle in EVE. Tens of thousands of pilots fighting tens of thousands of pilots in a real time all on a single shard.If you like this sort of Stuff then please consider offering your support on Patreon.

  • $9.3B: punishment for Google's temerity of using Java; 200: computer scientists and neuroscientists at Google’s DeepMind; 22: cores in Intel's new Xeon E5-2600 V4 CPU; 12: fold boost in spectrum efficiency over current 4G cellular technology using a massive antenna system; 

  • Quotable Quotes:
    • Linus Torvalds:  I’m not a big visionary. I’m a very plodding pedestrian engineer, and I try to keep my eyes firmly on the ground. I’ll let others make the big predictions about where we’ll be in 5, 10 or 25 years
    • theymos: "Core" doesn't think anything because it's not any sort of unified organization.
    • whalesalad: We are running Kubernetes in production at FarmLogs and LOVE it.
    • @StackPointCloud: The operational complexity associated with monitoring containers is multiplied given the 1:N relationship of host:containers. #NYCK8s
    • hu6Bi5To: AWS is significantly more expensive like-for-like, but it's worth remembering that you wouldn't architect your whole system that way if you were targeting AWS.
    • Demis Hassabis [DeepMind]: We don't think just observing is enough for intelligence, you also have to act. Ultimately that’s the only way you can really understand the world.
    • @inottawa: @TeslaMotors can't login to mytesla. Any chance in scaling up those servers?
    • Adrian Colyer: Cliffhanger can achieve the same hit rate with 45% less memory capacity. When memory is one of the most expensive resources in the datacenter, that’s definitely significant!
    • Google: We showed how Cloud Dataflow users no longer have to worry about specifying the number of workers or partitions, and how Cloud Dataflow dynamically adjusts the number of workers over time.
    • @PandoDaily: The switch to subscription has meant huge growth for Adobe
    • spriggan3: You're not hip enough anymore, the new good practice in the valley is femtoservices. Each statement running on its own server.
    • @adrianco: If you are confused about the Tesla Model 3 "launch" think of it as a huge $1000 Kickstarter project
    • Baidu: Our algorithm is able to use crowd data from Baidu maps to predict how many people will be [at a certain location] in the next two hours
    • @robertoglezcano: By 2020, 80% of people around the world (6 billion) will own a smartphone
    • @adrianco: Let me know when you run a 1000 node Cassandra cluster on Kubernetes :-) 
    • Seph Skerritt: The algorithm doesn’t care what you really are. It matters what you choose, and what you think you are.
    • @JimPethokoukis: "Last year, YouTube and sites like it generated $385 million in royalties ... vinyl records brought in $416 million"
    • @gigastacey: "Customers press a Dash button once every minute of the day." 
    • Julian Baggini: One of the paradoxes of creativity is that originality tends towards sameness and similarity. What makes a Wagner opera stand out from others is also what makes it unmistakably Wagnerian.
    • Grant Jensen: In this study, we revealed the beautiful complexity of this machine, [which] be the strongest motor known in nature. The machine lets M. xanthus, a predatory bacterium, move across a field to form a ‘wolf pack’ with other M. xanthus cells, and hunt together for other bacteria on which to prey

  • Chamath Palihapitiya: AWS is a tax on the compute economy.  so whether you care about mobile apps, consumer apps, IoT, SaaS etc etc, more companies than not will be using AWS vs building their own infrastructure.  ecommerce was AMZN's way to dogfood aws, and continue to do so so that it was mission grade.  if you believe that over time the software industry is a multi, deca trillion industry, then ask yourself how valuable a company would be who taxes the majority of that industry. 

  • This is spooky. Google does know everything but it's AI that makes that knowledge manifest in the world. Google shocked this man by offering sympathy on the death of his father:  Google Now was offering him condolences on the death of his dad before showing him what could be emotionally charged photos. "Mind. Blown. I'm sad, I'm amazed, I'm taken back. What a lovely moment for some automated robot voice to express it's sympathy to me," he said.

  • Stack Overflow still does the mostest with the leastest. Nick Craver with a great post on Stack Overflow: The Hardware - 2016 Edition. Sure, there's a lot of hardware porn (with pics), but Nick's mental checklist of the process he goes through to help determine what to order is really insightful. It's too big to include here, but some highlights: Is this a scale up or scale out problem? (Are we buying one bigger machine, or a few smaller ones?); How much redundancy do we need/want? (How much headroom and failover capability?); Will this server/application touch disk? (Do we need anything besides the spinny OS drives?). Also, an interesting analysis by hu6Bi5To of what Stack Overflow might look like on AWS. Less hardware redundancy, less always on capacity, more geographical redundancy. 

  • Mobile is still eating the world

  • Moore’s law really is dead this time. Honest question, with all the activity around AR/VR right now, will the end of Moore's law mean shrinking down the form factor of AR headsets so they won't need to be teathered is impossible?

  • Global Cloud - Active-Active and Beyond: "We [Netflix] decided to create a global cloud where we would be able to serve requests from any member in any AWS region where we are deployed.  The diagram below shows the logical structure of our multi-region deployment and the default routing of member traffic to AWS region." Being able to service any user from any datacenter is a huge simplification at the application level.

  • Google wants you to know they are in it to win it. Really. One concern developers have with Google's cloud efforts is that Google may pull the plug on their cloud, like they have for many other projects before. Ad giant Google thinks its cloud biz could be bigger than its adverts: "We wouldn't be scaling up in this way if we lost money every time we deployed and if we didn't think this is a real business," Diane Greene, the Chocolate Factory's VP of cloud told the press. "We're investing billions in this and it's a long-term business plan."

  • 1000 nodes and beyond: updates to Kubernetes performance and scalability in 1.2. Can't do any better than jbeda: I just want to give a public shout out to Wojtek on this blog post. It shows scalability in for an actual scenario at levels that most users won't need (10M req/s!). Beyond that, there is a clear methodology with lots of hard data. This along with listing the work that it took to get there. Very good post!

  • The NxM problem. What do you do when you have ‘N’ services that you want to deploy on ‘M’ hosts? CONTAINERS IN PRODUCTION: CASE STUDIES, PART 1. Spotify uses containers: "as a Spotify user’s app requires a particular microservice, it makes a request to Spotify’s servers and gets the data it needs. As new features are added, or more users are simultaneously asking for their own app services to start, Spotify needs to spin up more containers to meet the additional demand, and then drop them when there isn’t the need to maintain processing power on the server." So does Built.io: [using] "Docker we were able to create a platform as a service which is something our competitors couldn’t do. It made our story much more powerful. Using our MBaaS with Docker in production, you can upload full Node.js applications, for example." So does  IIIEPE: "The biggest discovery in our experiment was that in our containers and in our Git repos, this workflow doesn’t just work for Drupal, It also works for Node.js applications." 

  • Aren't the NoSQL wars over? To SQL or NoSQL? That’s the database question. Nice overview that generated a lot of comments. Still a lot of passion out there. 

  • 1.7 petabytes and 850M files lost, and how we survived it. The amazing story of all the work it took to recover really important scientific data that was stored for years on a "temporary" file system. There are lots of lessons and heroics, but the key one is to enforce a deletion policy on temporary filesystems. Temporary means temporary.

  • The amount of collective effort that has gone into avoiding garbage collection might be more than the effort that has gone into selling people advertising or sugar water. Why Apache Arrow is the future for open source Columnar In-Memory Analytics: Arrow is designed to maximize the cache locality, pipelining and SIMD instructions. Cache locality, pipelining and super-word operations frequently provide 10-100x faster execution performance. Since many analytical workloads are CPU bound, these benefits translate into dramatic end-user performance gains. These gains result in faster answers and higher levels of user concurrency.

  • This is not a future I look forward to. The 1,000-Revenue Stream Lifestyle: every one of us has a mixture of qualities that can potentially be monetized. Things that we did freely in the past may soon be gamified with incentives.

  • Again, better scheduling for performance. Polaris: Faster Page Loads Using Fine-Grained Dependency Tracking.

  • Great far ranging overview. The real cost of sequencing: scaling computation to keep pace with data generation: As the cost of sequencing continues to decrease and the amount of sequence data generated grows, new paradigms for data storage and analysis are increasingly important. The relative scaling behavior of these evolving technologies will impact genomics research moving forward.

  • Joining Amazon and Google, Azure is now in the function business too. Functions: Azure Functions is an event driven, compute-on-demand experience that extends the existing Azure application platform with capabilities to implement code triggered by events occurring in virtually any Azure or 3rd party service as well as on-premises systems. 

  • Do you need a spring cleaning for your systems? Looks like it may be profitable. Buffer Overflow saved $132K a year with an IT infrastructure audit. The reduced duplicate storage, optimized data warehouses, and improved efficiency across other AWS resources. In the process they found a lot of zombies. Good hunting. 

  • Writing a very fast cache service with millions of entries in Go: When the cache was filled, it had more than a second latency for 99th percentile. Metrics indicated that there were over 40 mln objects in the heap and GC mark and scan phase took over four seconds...The third way to omit GC for cache entries was related to optimization presented in Go version 1.5...fasthttp achieves its performance by reducing work that is done by HTTP Go package...ffjson documentation claims it is 2-3 times faster than standard json.Unmarshal, and also uses less memory to do it...we sped up our application from more than 2.5 seconds to less than 250 milliseconds for longest request.

  • Are the days of free flowing milk and honey over? Alphabet is learning how to spell Bean Counter. Google: Scaling Ahead of Cloud Burst: More recently, Alphabet closed a robotics unit and appears to be scrutinizing its other “bets” more closely. Tony Fadell, who runs the Nest home automation business Google acquired two years ago for $3.2 billion, openly complained of the new “fiscal discipline era” in an interview with the tech news site The Information last week.

  • An amazing deep dive into geofencing algorithms. Unwinding Uber’s Most Efficient Service: It is clear to me that the team at Uber under-engineered this problem. Thoughtfully designing this service could trim down the number of nodes by an order of magnitude and save hundreds of thousands of dollars each year. 

  • Yes indeed, complexity is preserved. Hiding it doesn't mean it's not there. That's the first law. Immutability is not enough: Surprisingly, many of the state update bugs that are rife in imperative programs can also occur in purely functional code...It turns out that our purely functional rendering code is sensitive to ordering in non-obvious ways...Programming is largely about building models...Each level of the model has its own set of properties, and the properties at one level do not automatically extend to other levels. Our higher level model includes identity: there is a stable, logical entity that we refer to as Manuel, whose position changes from frame to frame. In a sense, we are modelling a mutable object. Implementing that model with immutable data structures does not eliminate the difficulties of dealing with state updates – it just shifts them to a different conceptual level.

  • Wow. Empowerment is not an excuse to disembowel others. That elephant on the road is a god too. NoOps: Developer operations manifesto: "The cloud is here now. You [DevOps] have no place in the software process any more." Somehow when the anonymous author says "We are not flawless" I don't quite believe him. Also, Software Defined Talk: Let’s Hope Google Isn’t Serious about ‘NoOps’.

  • Do algorithms have to be politically correct or should they just work? Interesting question. In a way data mining is all about finding stereotypes. Here's a real world case. Can a Dress Shirt Be Racist?: "he set about developing an algorithm that could customize your shirt without needing a tape measure...In that first batch of 30, the shirts fit best on testers who were Caucasians. They seemed to fit worse, in a predictable way, on people who weren’t Caucasian...noted the anomaly and added a question on what he called “ethnicity”...The question has proven invaluable to sizing his customers." Can't DNA can help here? Shouldn't a person who is X% neanderthal have different fitting clothes too?

  • Murat with another great paper disection. He wields a sharp scalpel. Paper review: Measuring and Understanding Consistency at Facebook. What did we learn?: we cannot quantify the benefits of consistency models that include transactions, e.g., serializability and snapshot isolation, or the benefit of even read-only transactions on other consistency models

  • Iron.io's message queueing service hits 1M msg/sec: IronMQv3 Hits Dos Commas: The TL;DR is something similar to viewstamped replication via 2PC and a gossip protocol for membership, with synchronous replication to all replicas; meaning that each message is consistent across all replicas. Our unit of sharding is queues, and we automatically spread them around onto different nodes. We use 3 replicas by default for queues, meaning in each of these tests, each message is written to 3 replicas before returning, and we serialize all reads and writes through one of the replicas for each particular queue (i.e. master-slave). At the bottom, we’re running embedded local rocksdb inside of each MQ process for persistent storage of queues and all metadata.

  • Dart isn't dead yet. The new AdWords UI uses Dart — we asked why: Meeting all these very high bars for user experience, latency and feature velocity at the same time in a very large mission critical application is very hard. We thought Dart and Angular together was a good foundation to build the additional infrastructure we wanted to build for achieving all these goals.

  • One Billion Taxi Rides in PostgreSQLautotldr, a bot, with the TLDR:  PostgreSQL's default storage layout stores data in an uncompressed form so reading 100 MB/s of data off a disk during a sequential scan won't yield the same amount of rows per second as it would from a gzip-compressed CSV file.

  • Here's a story on Building the UI for the new The Times website. Good practical advice with some code examples. Biggest win: A huge benefit of creating Components and Modules is that your code is encapsulated. In my opinion, this is one of the most important things about large builds. It ensures that a change in a specific CSS file will only affect that Component/Module and not have any side-effect on other elements. 

  • Percona Live featured talk with Avi Kivity: Scylla, a Cassandra-compatible NoSQL database at two million requests per second: When we investigated the performance bottlenecks in Cassandra, we saw that while non-blocking message-passing was used between nodes (as it should be), blocking locks were used for inter-core communications, and blocking I/O APIs were used for storage access. To fix this problem, we wrote Seastar (http://seastar-project.org), a server application framework that uses non-blocking message-passing for inter-core communications and storage access. 

  • How do you handle querying the lots of databases that come with services? DataEngArchSimple: "Simple runs on over 35+ separate services, and most of those have their own Postgres database instance, So in a world without the data team or data warehouse, what does it look like to answer a simple business question about our customers?" They collapse all the databases into Redshift.

  • That one where it works in testing but sh*its the bed in production. Who doesn't hate that one? HipChat with a detailed March 25th Incident Report. Under higher load they were dropping logs with their legacy ELK stack. A new ELK stack was deployed and started failing 30 minutes later. There'some Heisenberg action because the high load interfered with rollbacks attempts. One lesson was to keep monitoring their canary server. Another lesson was not to push during high traffic parts of the day, that is, when the West Coast is active. Not sure that's the right lesson. What happens when you have a global audience and there is no downtime? 

  • Also good advice for building a tech campus, which is really just another kind of city. Data Mining Reveals the Four Urban Conditions That Create Vibrant City Life: vibrant activity can only flourish in cities when the physical environment is diverse...city districts must serve more than two functions so that they attract people with different purposes at different times of the day and night...city blocks must be small with dense intersections that give pedestrians many opportunities to interact...Vibrant urban areas are those with dense streets...the team found that a crucial factor for vibrancy is the presence of “third places,” locations that are not homes...density of people also turns out to be important.

  • The big data market: Surprisingly, larger enterprises (those with more than 5,000 employees) are adopting big data technologies much faster than smaller companies. You’d think the smaller, younger companies would be nimbler in embracing new tech, but when it comes to big data, the opposite is the case.

  • An impressive deep Dive into React Native performance. Improvements were in two broad categories: doing less and scheduling. Result: Events Dashboard startup is now twice as fast. 

  • Interesting from the perspective of dealing with backwards industries where everything is not yet API driven. Scaling Lessons From The Fastest Growing SaaS Company Ever: Create automated processes one step at a time; When dealing with established industries, expect things that don’t scale; Sometimes you have to sacrifice scalability (and product) to get an edge; Most people are caught off guard by exponential growth; Bottlenecks can come from surprising places; premature optimization is still the root of all evil.

  • If you are interested in Bluetooth here's a good introduction: Bluetooth Technology 101

  • Looks like it might be a good course. Scaling Applications with Microservices, MassTransit, and RabbitMQ.

  • apache/incubator-beam: a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends like Apache Spark, Apache Flink, and Google Cloud Dataflow.

  • github/ccql: a simple executable utility which executes a given set of queries on a given set of MySQL hosts in parallel.

  • citusdata/citus: Citus horizontally scales PostgreSQL across commodity servers using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable real-time responses on large datasets.

  • ifesdjeen/hashed-wheel-timer: the concept on the Timer Wheel is rather simple to understand: in order to keep track of events on given resolution, an array of linked lists (alternatively - sets or even arrays, YMMV) is preallocated. When event is scheduled, it's address is found by dividing deadline time t by resolution and wheel size. 

  • The Grace Programming Language: The purpose of Grace is to allow novices to discover programming in the simplest possible way. Other famous languages such as Java or Python are widely used by professionals, but may be hard to assimilate for a beginner in programming. That is what the object-oriented Grace language is made for.

  • mxgmn/ConvChain: Procedural generation from examples with convolutional neural nets and MCMC.

  • FPGA-based Hardware Acceleration for a Key-Value Store Database: This thesis investigates the use of a Field-Programmable Gate Array (FPGA) as a hardware accelerator for a key-value database. Utilized as a platform of reconfigurable logic, the FPGA offers massively parallel usability at a much faster pace than a traditional software-enabled database system.

  • Parsing Techniques - A Practical Guide [First Edition PDF]: It is therefore surprising that there is no book which collects the knowledge about parsing and explains it to the non-specialist. Part of the reason may be that parsing has a name for being “difficult”. In discussing the Amsterdam Compiler Kit and in teaching compiler construction, it has, however, been our experience that seemingly difficult parsing techniques can be explained in simple terms, given the right approach. The present book is the result of these considerations.