Stuff The Internet Says On Scalability For October 30th, 2015

Hey, it's HighScalability time:


Movie goers Force Crashed websites with record ticket presales. Yoda commented: Do. Or do not. There is no try.

  • $51.5 billion: Apple quarterly revenue; 1,481: distance in light years of a potential Dyson Sphere; $470 billion: size of insurance industry data play; 31,257: computer related documents in a scanned library; $1.2B: dollars lost to business email scams; 46 billion: pixels in largest astronomical image; 27: seconds of distraction after doing anything interesting in a car; 10 billion: transistor SPARC M7 chip; 10K: cost to get a pound in to low earth orbit; $8.2 billion: Microsoft cloud revenue; 

  • Quotable Quotes:
    • @jasongorman: A $trillion industry has been built on the very lucky fact that Tim Berners-Lee never thought "how do I monetise this?"
    • Cade Metz: Sure, the app [WhatsApp] was simple. But it met a real need. And it could serve as a platform for building all sorts of other simple services in places where wireless bandwidth is limited but people are hungry for the sort of instant communication we take for granted here in the US.
    • Adrian Hanft: Brand experts insist that success comes from promoting your unique attributes, but in practice differentiation is less profitable than consolidation.
    • Jim Butcher: It’s a tradition. Were traditions rational, they’d be procedures.
    • Albert Einstein~ Sometimes I pretend I’m the Mayor of my kitchen and veto fish for dinner. ‘Too fishy’ is what I say!
    • @chumulu: “Any company big enough to have a research lab is too big to listen to it" -- Alan Kay
    • Robin Harris: So maybe AWS has all the growth it can handle right now and doesn’t want more visibility. AWS may be less scalable than we’d like to believe.
    • Michael Nielsen: Every finitely realizable physical system can be simulated efficiently and to an arbitrary degree of approximation by a universal model (quantum) computing machine operating by finite means.
    • Sundar Pichai~ there are now more Google mobile searches than desktop searches worldwide.
    • Joe Salvia~ The major advance in the science of construction over the last few decades has been the perfection of tracking and communication.
    • apy: In other words, as far as I can tell docker is replacing people learning how to use their package manager, not changing how software could or should have been deployed.
    • @joelgrus: "Data science is a god-like power." "Right, have you finished munging those CSVs yet?""No, they have time zone data in them!"
    • @swardley: "things are getting worse. Companies are increasingly financialised and spending less on basic research" @MazzucatoM 
    • Dan Rayburn: The cause of what Akamai is seeing is a result of Apple, Microsoft and Facebook moving a larger percentage of their traffic to their in-house delivery networks.
    • @littleidea: containers will not fix your broken architecture you are welcome
    • spawndog: I've typically found the best gameplay optimization comes from a greater amount of creative freedom like you mention. Lets not do it. Lets do it less frequently. Lets organize the data into something relative to usage pattern like spatial partitions.
    • @awealthofcs: The 1800s: I hope I survive my 3 month voyage to deliver a message to London Now: The streaming on this NFL game in London is a bit spotty
    • @ddwoods2: just having buffers ≠ resilience; resilience = the capacities for changing position/size/kind of buffers, before events eat those buffers
    • unoti: There's a dangerous, contagious illness that developers of every generation get that causes them to worry about architecture and getting "street cred" even more than they worry about solving business problems. I've fallen victim to this myself, because street cred is important to me. But it's a trap.
    • @kelseyhightower: Kubernetes is getting some awesome new features: Auto scaling pods, Jobs API (batch), and a new deployment API for serve side app rollouts.

  • Great story on Optimizing League of Legends. The process: Identification: profile the application and identify the worst performing parts; Comprehension: understand what the code is trying to achieve and why it is slow; Iteration: change the code based on step 2 and then re-profile. Repeat until fast enough. Result: memory savings of 750kb and a function that ran one to two milliseconds faster. 

  • Fantastic article on Medium's architecture: 25 million uniques a month;  service-oriented architecture, running about a dozen production services; GitHub; Amazon’s Virtual Private Cloud; Ansible; mostly Node with some Go; CloudFlare, Fastly, CloudFront with interesting traffic allocations; Nginx and HAProxy; Datadog, PagerDuty, Elasticsearch, Logstash, Kibana; DynamoDB, Redis, Aurora, Neo4J; Protocol Buffers used as contract between layers; and much more.

  • Are notifications the new Web X.0? Notification: the push and the pull: Right now we are witnessing another round of unbundling as the notification screen becomes the primary interface for mobile computing.

  • Algorithm hacking 101. Uber Surge Price? Research Says Walk A Few Blocks, Wait A Few Minutes.

  • Can you build a transaction processing system that is at least one order of magnitude faster than the state-of-the-art systems using advanced processor features and fast interconnects? Yep. Fast In-memory Transaction Processing using RDMA and HTM: This paper described DrTM, an in-memory transaction processing system that exploits the strong atomicity of HTM and strong consistency of RDMA to provide orders of magnitude higher throughput and lower latency of in-memory transaction processing thanprior general designs. 

  • If you are interested in immutable databases then the The rise of immutable data stores is a good read. 

  • The wood wide web. An internet of fungi help plants communicate: Hidden beneath the surface and entangled in the roots of Earth's astonishing and diverse plant life, there exists a biological superhighway linking together the members of the plant kingdom...This organic network operates much like our internet, allowing plants to communicate, bestow nutrition, or even harm one another...This fungal network has been found to allow plants to aid one another in growth and flourishing. 

  • Pinterest is open-sourcing an impressive number of their MySQL management tools. On GitHub.

  • Can we have our cake and transactions too? Why MongoDB, Cassandra, HBase, DynamoDB, and Riak will only let you perform transactions on a single data item: "most NoSQL systems have chosen to disallow general transactions altogether rather than become susceptible to the performance pitfalls that distributed transactions can entail." Must this be so? No: "not only is it possible to build scalable systems with high throughput distributed transactions, but there actually exist two classes of systems that can do so: those that sacrifice isolation, and those that sacrifice fairness."

  • The carrot for HTTP/2. Study Shows HTTP/2 Can Improve Website Performance Between 50-70 Percent.

  • I'd like to think this true. A Certain Tendency Of The Database Community: We are moving towards large-scale edge computation. Given users are the source of truth for their own information, the database challenge is largely a constraint problem: how do we know where to send requests for information, where we supply some notion of how stale we allow that information to be, and how fast we need the information to be provided to us. 

  • Here's how Foursquare's presence system works: our users have crawled the world for us and have told us more than 7 billion times where they’re standing and what that place is called. Each time they do, we attach a little bit more data to our models about how those places look to our phones out in the real world.

  • This sounds fun. Microsoft Runs the Largest Botnets to Protect Azure Customers: Microsoft owns the 10 largest botnets in the world, but just as it takes a thief to catch a thief, it may require running a botnet to save organizations from botnets.

  • This sounds funner. Inside the F1 Race 's data center: As for why a car race might need its own portable data center with 130 tons of gear, F1 is arguably the most data-driven sport out there. Each car is outfitted with roughly 150 sensors generating about 2,000 data points a minute.

  • This looks like a better approach than iBeacons. The Physical Web is a Speed Issue. The delta between Netscape and Gmail was 10 years. The physical web is just getting started. Javascript bluetooth embedded in web pages so you can talk directly to devices without an app sounds like a winner. The physical web has devices broadcasting a URL using bluetooth LE, the phone scans the URL, then it takes you to that page. So if you want to pay for parking you go to the correct parking meter. The page for that meter comes up and then you pay with a tap. It's not push, it's pull. It's not for a class of devices, it's per device. It's light weight. You don't need a thousand apps, you just pull down a web page over this Internet thingy.

  • How do you run complex business logic with a few hundred milliseconds latency using commonly used technologies? Here are 9.5 tips with rich details for achieving your low latency goals: Measure your API Latency as a Function of Probability; Define How the Client Should React in Case of a Timeout; All subsystems must be able to make a decision; Handle Excess Traffic; Use Dynamic Timeouts for I/O Operations; Use Auto-Healing (Automatic Failover); Plot Latencies, it’s a Big Time Saver; Know Your (Latency) Enemy; Low-Latency Tweaks for Data Stores; Offload Tasks Before/After the Low Latency Critical Path.

  • Do you still manually resize your images like animals? Some services: https://www.imgix.com, https://www.filepicker.com, http://cloudinary.com. Or you can learn Efficient Image Resizing With ImageMagick. But perhaps GraphicsMagick is better?

  • Let’s consider three scenarios for the evolution of supercomputers in the range of 1-50 exaflops: Devices and scaling: Millivolt switches must be perfected based on currently unknown principles of device physics and the unpredictable ingenuity of researchers — leading to commercialization of a new technology with at least 10 years lead time; 3D: This scenario is of intermediate desirability. Industry is very likely to develop 3-D technology for storage and mobile devices quite independently of supercomputers; Architectural specialization: this scenario is very likely to present the programmer with idiosyncratic architectures that require extensive experience by the programmer, and may lead to code that is not easily repurposed to other applications.

  • Periscope Data found queries ran 150 times faster on their cache. Building the Periscope Data Cache with Amazon Redshift

  • Here's how Facebook deals with the challenges of handling 360 video. Under the hood: Building 360 video: Let's take a moment to think about how big these files can be. To create a 360 video, either you use a special set of cameras to record all 360 degrees of a scene simultaneously or you have to stitch together angles from, say, four GoPros on a stick. Incoming 360 video files are 4K and higher, at bit rates that can be over 50 Mb per second — that's 22 GB per hour of footage. And 3D 360 Stereo videos are twice that — 44 GB for an hour of footage. We tried to do a few things when working with file size: We wanted to decrease the bit rate and save storage, but we wanted to do it quickly so people wouldn't have to wait for the video, and we didn't want to compromise the video quality or resolution.

  • Everyone talks about big data but who does anything about it? New AWS Public Data Set – Real-Time and Archived NEXRAD Weather Data. The Next Generation Weather Radar (NEXRAD) is a network of 160 high-resolution Doppler radar sites that detects precipitation and atmospheric movement and disseminates data in approximately 5 minute intervals from each site.

  • Videos from OSCON Amsterdam 2015 are now online. Videos are also showing up for Velocity Conference 2015 (Amsterdam).

  • I've always wondered this too. Is Anyone Using Long-Distance VM Mobility in Production? Ivan Pepelnjak says: So far, I haven’t seen a single one, apart from the case where a DC was split across two buildings 100 m apart with tons of dark fiber in between.

  • IPTV scores a touchdown. Yahoo Pulls Off Successful NFL Webcast With Very Minor Hiccups: "The stream was split between multiple content delivery networks including Akamai, Limelight, Level 3 and Verizon amongst others. Yet Dan Rayburn still says: "Video broadcast via the Internet can’t compare to cable TV distribution, when it comes to reliability and QoS."

  • Legally it's better to look mind numbingly stupid rather than appear guilty. An Engineering Theory of the Volkswagen Scandal.

  • If you like S3 you might like Backblaze's B2 service better. David Rosenthal with a great analysis. More interesting numbers from Backblaze: B2 service actually competes with S3; once you get a certain amount bigger the economies of getting even bigger tail off; B2's storage is 2.5 times cheaper than S3's cheapest option with simpler and cheaper access charges; S3 is designed for 11 nines of durability, instead of B2's 8 nines.

  • Intel is taking the network seriously. Greg Ferro with a great analysis. Blessay: Intel the Networking Company: Intel is participating in the network space and, to some extent, pushing the network market forward to increase bandwidth and reduce latency; Intel is, in effect, entering the market for Telcos, Carriers & Service Providers; Intel influences the market within open source projects, or with partnerships with vendors.

  • Do you know how much your computer can do in a second? Lots more than in actually gets to do. Good discussion on HackerNews.

  • Performance in Big Data Land: Every CPU cycle matters: "We calculate one CPU cycle operation at 1 / 3,700,000,000 of a sec and multiply it by 100 Billion records equals 27 seconds of processing time." Save processing time: choose INT types; set AUTOCOMMIT to OFF.

  • Interesting discussion on Lambda the Ultimate of Leslie Lamport: Thinking for Programmers video. I like Lamport's emphasis of the relationship between thinking and writing. The questions are: can that writing be in code or must it be in a separate spec? Does language choice make a difference?

  • FIT: A Distributed Database Performance Tradeoff: In this article, we discuss the three-way relationship between three such desirable features — fairness, isolation, and throughput (FIT) — and argue that only two out of the three of them can be achieved simultaneously.

  • Pivotal Greenplum Database: an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics.

  • More Than You Ever Wanted to Know about Synchronization: In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, Sun Microsystems and AMD.

  • GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System: This work describes the design and implementation of a new graph processing system based on Bulk Synchronous Parallel model. Our system is built on top of ZHT, a scalable distributed key-value store, which benefits the graph processing in terms of scalability, performance and persistency. The experiment results imply excellent scalability.

  • PalDB: designed to store side data, data that’s needed for a certain very small piece of an entire application.

  • facebook/network-connection-class: an Android library that allows you to figure out the quality of the current user's internet connection.

  • agate: a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that helps you solve real-world problems with readable code.

  • statex: A cross platform native application architecture...As the core of an application, the state machine transform (mutate) states upon receiving user's actions from the front-end views, or events from the backend. In general, it is a set of handlers for each action and event. 

If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.