hot links

Stuff The Internet Says On Scalability For November 11th, 2016

High Scalability

11 Nov 2016 — 22 min read

Hey, it's HighScalability time:

Hacking recognition systems with fashion.

If you like this sort of Stuff then please support me on Patreon.

9 teraflops: PC GPU performance for VR rendering; 1.75 million requests per second: DDoS attack from cameras; 5GB/mo: average data consumption in the US; ~59.2GB: size of Wikipedia corpus; 50%: slower LTE within the last year; 5.4 million: entries in Microsoft Concept Graph; 20 microseconds: average round-trip latencies between 250,000 machines using direct FPGA-to-FPGA messages (Microsoft); 1.09 billion: Facebook daily active mobile users; 300 minutes: soaring time for an AI controlled glider; 82ms: latency streaming game play on Azure;

Quotable Quotes:
- AORTA: Apple’s service revenue is now consistently greater than iPad and Mac revenue streams making it the number two revenue stream behind the gargantuan iPhone bucket.
- @GeertHub: Apple R&D budget: $10 billion NASA science budget: $5 billion One explored Pluto, the other made a new keyboard.
- Steve Jobs: tie all of our products together, so we further lock customers into our ecosystem
- @moxie: I think these types of posts are also the inevitable result of people overestimating our organizational capacity based on whatever limited success Signal and Signal Protocol have had. It could be that the author imagines me sitting in a glass skyscraper all day, drinking out of champagne flutes, watching over an enormous engineering team as they add support for animated GIF search as an explicit fuck you to people with serious needs.
- @jdegoes: Devs don't REALLY hate abstraction—they hate obfuscation. Abstraction discards irrelevant details, retaining an essence governed by laws.
- @ewolff: There are no stateless applications. It just means state is on the client or in the database.
- @mjpt777: Pushing simple logic down into the memory controllers is the only way to overcome the bandwidth bottleneck. I'm glad to see it begin.
- @gigastacey: Moral of @0xcharlie car hacking talk appears to be don't put actuators on the internet w/out thinking about security. #ARMTechCon
- @markcallaghan: When does MySQL become too slow for analytics? Great topic, maybe hard to define but IO-bound index nested loops join isn't fast.
- @iAnimeshS: A year's computing on the old Macintosh portable can now be processed in just 5 seconds on the #NewMacBookPro. #AppleEvent
- @neil_conway: OH: "My philosophy for writing C++ is the same as for using Git: 'I stay in my damn lane.'"
- qnovo: Yet as big as this figure sounds, and it is big, only 3 gallons of gasoline (11 liters) pack the same amount of energy. Whereas the Tesla battery weighs about 1300 lbs (590 kg), 3 gallons of gasoline weigh a mere 18 lbs (8 kg). This illustrates the concept of energy density: a lithium-ion battery is 74X less dense than gasoline.
- @kelseyhightower: I'm willing to bet developers spend more time reverse engineering inadequate API documentation than implementing business logic.
- @sgmansfield: OH: our ci server continues to run out of inodes because each web site uses ~140,000 files in node_modules
- @relix42: “We use maven to download half the internet and npm to get the other half…”
- NEIL IRWIN: economic expansions do not die of old age—an old expansion like our current one is not likelier to enter a recession in the next year than a young expansion.
- @popey: I am in 6 slack channels. 1.5GB RAM consumed by the desktop app. In 100+ IRC channels. 25MB consumed by irssi. The future is rubbish.
- @SwiftOnSecurity: The only way to improve the security of these IoT devices is market forces. They must not be allowed to profit without fear of repercussions
- The Ancient One: you think you know how the world works. What if I told you, through the mystic arts, we harness energy and shape reality?
- @natpryce: "If you have four groups working on a compiler*, you'll get a four-pass compiler" *and you describe the problem in terms of passes
- @PatrickMcFadin: Free cloud APIs are closing up as investors start looking for a return. Codebender is closing down
- @ServerlessConf: Stephen Fink explaining how IBM @openwhisk use stem cell and warm containers to respond to requests with low latency #serverlessconf
- Evan Jones: That [App Engine] causes most performance issues to show up as large bills, rather than production failures. This is good news for our customers, but bad news for Bluecore, so we keep a close eye on our costs.
- @qhardy: Brute force computing against huge data sets was better than a thousand theory-based algorithms.
- @tlberglund: Serverless is to cloud-based µService architectures as triggers are to RDBMSes. Serverless is not a general-purpose architecture. Discuss.
- The World Walker: “The Order is based on orthopraxy rather than orthodoxy,” she said. “Whoah,” said Meera. “Small words. I’m from Brixton.” “Orthodoxy means ‘right teaching’, orthopraxy means ‘right practice’
- @viktorklang: "One microservice is no microservice—they come in systems." - deliberately misquoting @CarlHewitt
- @tapbot_paul: Did people bitch this much when Apple got rid of ADB? They just announced laptops with 4 PCIe ports and people act like its the end of days.
- avitzurel: If you give Deis/OpenShift/Kubernetes to the typical YC founder (I am not trying to insult anyone here) that's trying to get their app in their cloud, it's just too much. It's too much of things they don't care about right now. By not caring about it right now they are likely vendor-locking themselves for a very long time.
- @jjarmoc: In a relatively short time we've taken a system built to resist destruction by nuclear weapons and made it vulnerable to toasters.
- @danielbryantuk: "When you discover something in your software that is hard to change, that is 'architecture' " @KevlinHenney #OreillySACon
- @david_perell: "Anybody who doesn’t change their mind a lot is dramatically underestimating the complexity of the world we live in.” - Jeff Bezos
- @Develop_D: @BenedictEvans it, like all Nokia phones, had to pass 1.5m drop to concrete... And it did. Turns out people don't care about that as much
- @RickByers: Eg. love this quote about TCP/IP winning: "Standards should be discovered, not decreed. Seldom has it worked any other way."
- @swardley: AWS close to doubling capacity per year, it'll be spending this years "gross profit" on investing in future years growth
- @ValaAfshar: First time ever, more websites were viewed on mobile devices and tablets than desktops. —@qz
- @swardley: OH "more people have gone to #openstack conferences than cores in production" ... ouch. Harsh.
- @lithinn: Love this - devs think operations are tools for running their code; devops think code is the glue for their infrastructure. #serverlessconf
- @UnrealCraig: Serverless Map Reduce on @awscloud with #Lambda & #S3 process 200GB for 39¢ with no servers to worry about :) Happy days #ServerlessConf
- nl: Here's the truth. If you want to work in ML, and are all excited about the (justified) deep learning hype, you are much better off learning decision trees, random forest and gradient boosting first, and then learning neural networks. Most of the time, for most non-binary data XGB/VW will outperform a NN, be easier to use and more interpretable.
- @pvblivs: 88% market share. -3.6% profits. That's the whole of the Android industry
- @SwiftOnSecurity: "Omg why did you minify Python" "No that's what perl looks like"
- @danielbryantuk: "Microservices encourage isolation and limit blast radius, but the security attack surface gets bigger e.g. network call" @samnewman #mucon
- @gtheofanous: “Finding the root cause of a failure is like finding a root cause of a success”- John Allspaw ... Stop blaming individuals!! #MUCON
- @chrisfralic: "Facebook and Google get 68% of US Online Ad dollars. Revenue from all others is shrinking, down 5% last quarter"
- @etherealmind: Perspective: AWS Qtr sales $3.4B. Cisco Qtr sales 12.64B. HPE 12.2B. Oracle 8.6B. While AWS is growing, its small overall.
- Stephen Orban: This experience led to a business case to save or reallocate more than $100 million in costs across all of News Corp (our parent company) by migrating 75% of our applications to the cloud as we consolidated 56 data centers into 6. While News Corp continues its Journey toward 75%, it was able to realize its savings target in about 2 years.
- sievebrain: "Yet another JVM language"? Seems like a strange way to present a very interesting project. Why not target the JVM - do people write "Yet another x86 assembly language" or "Yet another LLVM language"?
- @ewolff: I hate statements like "Microservices must be fully async". Architecture is trade-offs. There is no one size fits all. Think for yourself.
- @danielbryantuk: "Every system is a distributed system... The question is what is the distance between components" @viktorklang #swisscomsoftwareday
- @mfratto: Yep, seamless cloud portability is a sham unless devs take the time to isolate per cloud dependent features from the core.
- @vgaltes: I liked this sentence: a bunch of highly coupled microservices is just a monolith #serverlessconf
- @dsandler: there ain't no party like a Turing machine party cuz a Turing machine party may or may not stop
- @beaknit: @cmeik When I look up at the stars, it reminds me that god crafted vim with his own compiler
- @kellabyte: Having built a 1,000 node Cassandra+Solr cluster, it was way easier (not easy) to run than a 16 node ES cluster :P
- agonnaz: Design from the outside in. Design as much of the API as possible without writing any implementation code. If writing the code leads you to realize that you need to change the API, leave the implementation and redesign the API without changing any more implementation.
- That's how you design an API well. Design from the perspective of a user.
- dpark: You can't put absolute worst case times on code in practice. Worst case might be that someone managed to load 80PB of RAM on a CPU underclocked to 50 MHz. Or that the OS suspended the process in the middle of GC and was promptly frozen for a week while VMs were migrated across the country in the cup holder of someone's Toyota.
- @adriancolyer: David Parnas himself comments on today's post! Wow. And yes, he DOES consider making microservices source code available to others harmful!
- @adriancolyer: "the connections between microsvcs are *all* of the assumptions which the services make about each other..."
- nickpsecurity: Sure you can [give worst-case analysis]. It's standard in embedded software. Called WCET analysis. There's also real-time GC's. Go team has simply not done either of these in their design. So, they can't give a worst-time estimate.
- @antirez: An HN user nailed it about AMP: “It’s not your site anymore. You are just a free content provider to Google now”.
- @elanazak: Facebook has added the equivalent of Twitter's entire user base in a year.
- Dylan Beattie: Perhaps we should be optimising our communication patterns for discoverability instead of raw bandwidth; trading a little temporary velocity for some long-term efficiency.
- discodave: Google is the leader in large scale infrastructure for a single (or small set) of customers (Google, Youtube etc). You could call this "private hyper-scale cloud". AWS is the leader in vending that infrastructure to the rest of the world (a.k.a. public cloud).
- @garybernhardt: Slack (closed-source, third-party clients disallowed, maintains total control over third-party apps) lectures Microsoft to "be more open".

The true program is the programmer. Ralph Waldo Emerson: “The true poem is the poet's mind; the true ship is the ship-builder. In the man, could we lay him open, we should see the reason for the last flourish and tendril of his work; as every spine and tint in the sea-shell preexist in the secreting organs of the fish.”

Who would have thought something like this was possible? A Regex that only matches itself. As regexes go it's not even all that weird looking. One of the comments asks for a proof of why it works. That would be interesting.

Docker in Production: A History of Failure. Generated a lot of heat and some light. Good comments on HN and on reddit and on reddit. A lot of the comments say yes, there a problems with Docker, but end up saying something like...tzaman: That's odd, we've been using Docker for about a year in development and half a year in production (on Google Container engine / Kubernetes) and haven't experienced any of the panics, crashes yet (at least not any we could not attribute as a failure on our end).

federation means stasis while centralization means movement. A thoughtful elegy describing why Signal is not federated and why centralization has won. Reflections: The ecosystem is moving: cannibalizing a federated application-layer protocol into a centralized service is almost a sure recipe for a successful consumer product today. It's what Slack did with IRC, what Facebook did with email, and what WhatsApp has done with XMPP. In each case, the federated service is stuck in time, while the centralized service is able to iterate into the modern world and beyond...Early on, I thought we'd federate Signal once its velocity had subsided. Now I realize that things will probably never slow down, and if anything the velocity of the entire landscape seems to be steadily increasing...extensions don't mean much unless everyone applies them, and that's an almost impossible task in a truly federated landscape...an open source infrastructure for a centralized network now provides almost the same level of control as federated protocols, without giving up the ability to adapt.

Videos from Serverlessconf are now available. There's no shortage of recaps. Who’s Jeff?—Things we learned at #Serverlessconf London 2016, Serverless London 2016, ServerlessConf London: Recap, Recap on the Serverless Conf. Looks like a revolution...or at least an evolution.

Become a data analysis Jedi from lessons extracted from years of experience analyzing search logs at Google. It's not abstract advice. It details habits you can establish to do high quality work. Practical advice for analysis of large, complex data sets: Look at your distributions; Consider the outliers; Report noise/confidence; Look at examples; Slice your data; Consider practical significance; Check for consistency over time; Separate Validation, Description, and Evaluation; Confirm expt/data collection setup; Check vital signs; Standard first, custom second; Measure twice, or more; Check for reproducibility; Check for consistency with past measurements; Make hypotheses and look for evidence; Exploratory analysis benefits from end to end iteration; Data analysis starts with questions, not data or a technique; Acknowledge and count your filtering; Ratios should have clear numerator and denominators; Educate your consumers; Be both skeptic and champion; Share with peers first, external consumers second; Expect and accept ignorance and mistakes.

Mark Heath with a nice gloss on Patrick Debois debunking many of the supposed advantages of serverless. Is it really “better”? Is it really “faster”? Is it really “more reliable”? Is it really “more secure”? Is it really “cheaper”? Does it really offer us “better service”? All still arguable, the message seems to be servelss has the potential to make a difference, but it is still early days.

Videos from Networking @Scale Boston are now available. Lots of interesting sounding talks: Networking @ Google; Introducing Akamai's Cloud Networking; Scaling Facebook Live.

Jessica Kerr with a delightful set of notes on Alan Kay's appearance at #CodeMesh. He's definitely a quote machine: Computing is kind of a pop culture. We don't care much about the past.

Epic Move: An AWS Exodus: For an early stage start-up, it's hard to beat the speed of delivery, operational off-loading, and agility that Amazon Web Services makes possible. But as some enterprises grow, there comes a critical point where those early benefits can all too quickly transform what was once the backbone of your minimum viable product into a super massive black hole sucking in all of your funding and driving your run rate through the roof. Join us for a glimpse into the motivations, false starts, successes, and lessons encountered by the Datto engineering team in their continuing mission to save the Backupify SaaS application from the brink of the AWS event horizon. A mission that would ultimately entail the creation of one of the world's largest OpenStack Swift clusters and the migration of more than 10 petabytes of data and a host of distributed services including PostgreSQL, Redis, and Cassandra.

Attack the cache. That seems to be a common system failure mode. When the cache breaks the system breaks. DRAMMER (DRAM Hammer) and the Mirai DNS attack.

Paul Johnston on why containers are not the answer. The future of serverless. They are essentially instances that are hidden. You don't have to run the server. That's the nice bit about them. You still have to write code. They are still monoliths. The advantage of Lambda is that if you have 50 lambda functions and you have 3 that are heavy lifting it will scale the heavy lifting ones and the rest will just sit there doing nothing. If you have Docker container and you need to scale and the three heavy lifting things are in that one container it will scale the whole thing. It's about decomposing your system into specific functions and only scaling what you need.

The Brain's Now. Time and memory are linked. More bandwidth in memory the more time it seems to take. That's why time seems to slow down when you are scared. More is being remembered about the time interval so it takes longer to relive.

For a trip down nostalgia lane...Perl and the birth of the dynamic web. What's old is new again, servless is basically just cloud washed CGI.

Extracting 25 TFLOPS from AWS Lambda, or #TheCloudIsTooDamnHard. Interesting things to note: The rate of submitting jobs is slow; Most lambda jobs start very quickly after host submission. The variance increases the more active running jobs there are; Setup can take a while -- for this short job, it's ~20% of the execution time; Some jobs finish incredibly quickly, suggesting they are running on faster / less-contested hardware; There are some real stragglers -- note the cluster of jobs finishing around 180s; the maximum number of simultaneous lambda workers is crunching on threads, which peaks at over 25 TFLOPS! This feels amazing for a bunch of plain-old python processes.

If you want to explain what serverless is to someone then this article by Peter Sbarski is a very good start: The essential guide to serverless technologies and architectures.

Serverless GraphQL: using Lambda and API Gateway to provide GraphQL APIs to React (and other) frontends. The isolation and scalability provided by Lambda makes it easy to fan out GraphQL resolvers and keep users from affecting each others’ performance and availability. It also means different teams can provide resolvers and all implement them in separate Lambdas.

At the heart of every program is some mechanism for managing IO. Here's A brief history of select(2): select came at the same time as sockets, but it wasn't implemented purely for networking. It seems that all of multiplexing, sockets and IPC came at about the same time, without a coherent grand design.

With all this talk about APIs, why don't we each have our own personal APIs that other people can program to? Scary thought. Introducing BobAPI — A Personal API to Collect and Share All of My Life Data.

Moving queues into hardware gives a 20-fold performance improvement in test simulations. Using hardware queues to break the multi-core CPU bottleneck: We have to improve performance by improving energy efficiency. The only way to do that is to move some software to hardware. The challenge is to figure out which software is used frequently enough that we could justify implementing it in hardware. There is a sweet spot.

Though shalt not over engineer. 10 Modern Software Over-Engineering Mistakes: The House (Business) Always Wins; Prefer Isolating Actions than Combining; Duplication is better than the wrong abstraction; Wrappers are an exception, not the norm. Don’t wrap good libraries for the sake of wrapping; Always take a step back and look at the macro picture; Concepts need shift in Mindset. Cannot be applied blindly like tools; TL;DRs should not be used everywhere; Don’t let <X>-ities go unchallenged. Clearly define and evaluate the Scenario/Story/Need/Usage; Reuse. Fork. Contribute. Reconsider; Refactoring is part of each and every story. No code is untouchable; Bad Estimation destroys Quality even before a single line of code is written.

Instant Messaging at LinkedIn: Scaling to Hundreds of Thousands of Persistent Connections on One Machine: we presented an overview of how we used Server-sent events for maintaining persistent connections with LinkedIn’s Instant Messaging clients. We also showed how Akka’s Actor Model can be a powerful tool for managing these connections on the Play Framework.

Geoff Huston thinks the move to the edge will choke transit. The Death of Transit? (more in Network Break 111). Once upon a time traffic moved over the backbone, now CDNs are putting points of presence at the edge all over the world and companies like Google and Facebook are building their own backbone networks. So for companies like Level 3, and other Tier-1 transit networks, growth of bandwidth on the backbone is shrinking as content moves out to CDNs.

Top-down learning path: Machine Learning for Software Engineers: This is my multi-month study plan for going from mobile developer (self-taught, no CS degree) to machine learning engineer. My main goal was to find an approach to studying Machine Learning that is mainly hands-on and abstracts most of the Math for the beginner. This approach is unconventional because it’s the top-down and results-first approach designed for software engineers.

Both Apple and Google have something in common: moving up the S curve, trading pros for mass consumers. Apple just told the world it has no idea who the Mac is for. Google Has Quietly Dropped Ban on Personally Identifiable Web Tracking

Why Uber and Airbnb met very different outcomes in New York City: What Airbnb wants to do is something that is opposed by broad constituencies,” said Micah Lasher, who worked for Attorney General Eric Schneiderman when he, another union ally, took on Airbnb. “That’s not true for Uber.”

Mark Callaghan shares how he finds performance problems. Make MyRocks 2X less slow. Often the problems are mutex contention related. This time was bus-cycles. No, I've never heard of that problem either. Of greater interest are all the perf commands he used to track the problem down.

Step-by-step tutorial to build a modern JavaScript stack from scratch. Good how-to details.

It's the latency stupid. Building a Shop with Sub-Second Page Loads: Lessons Learned: "It turns out that increasing bandwidth beyond 5 Mbps does not really have an effect on page load time at all. But decreasing latency of individual requests drives down page load time. That means doubling the bandwidth leaves you with the same load time, while cutting in half latency will give you half your load time." How do you reduce latency?: use persistent connections; Avoid redirects; Use HTTP/2 if possible. It comes with server push to transfer multiple resource for a single request, header compression to drive down request and response sizes and also request pipelining and multiplexing to send arbitrary parallel request over a single connection; Set explicit caching headers for your static resources; Use a Content Delivery Network (CDN) to cache images, CSS, JS and HTML; Consider building a Single-Page App with a small initial page that loads additional parts asynchronously.

You would expect networked systems to have similar failure modes. How often have your distributed systems been in a coma? Study reveals a network within the brain that plays a role maintaining consciousness: the network between the brainstem and these two cortical regions plays a role maintaining human consciousness...the added value of thinking about coma as a network disorder is it presents possible targets for therapy, such as using brain stimulation to augment recovery.

Cool Series of articles on Serverless Architecture – a practical implementation: IoT Device data collection, processing and user interface. How do you store and access the output of 14 security cameras for less than the $2250 a year Nest would charge? It involves Python, S3, raspberrypi, lambda, DynamoDB, REST, API Gateway, but the important part is how it all works together and that's explained in a very helpful amount of detail. Incredible job and a fun project!

An amazing in-depth explanation of NUMA, cache coherency, and the server memory hierarchy. Linux Performance in Cloud. NUMA (Non Uniform Memory Access): "architecture allows designing a bigger system configuration but at a cost of varying memory latencies. System designed with multiple cpu sockets is NUMA." Also lots application design tips like Reduce TLB miss by using Linux HugePages.

Steam Dev Days 2016 videos are now available. Will VR create a utopia or a dystopia? This is a question that interests Tim Sweeney, from Epic Games, in his optimistic vision of The Future of VR and Games. It's a PC powered revolution. Development tools haven't changed much over the years, but that's going to change, with development happening within VR. Within a few years all painting, sculpting, and modeling of objects, and play testing of 3d objects will be done in the VR medium directly. This will open up authoring to a much larger variety of users. Game technologies continue to escape into other industries. VR is the complete opposite experience of your phone. VR has a large field of view and much more realistic graphics. It will take a long time for high-end mobile VR to be a thing. Real-time multi-player experiences will be at consumer level of technology in a few years, which will lead to completely new type of VR experiences that haven't even been invented yet. Predicts in 15 years a world-wide installed base of 4 billion Oakley sun glass sized VR rigs, that immerses you in a full field of view experience with 8k pixels per eye, which is nearly indistinguishable from reality. These devices will much cheaper than any TV. They are smaller and use much less material. Hopes the future metaverse that links all of humanity together should be based on open VR technology. VR needs a protocol and code base that are open, like the web. It doesn't exist yet, but he gives an outline of some of the requirements.

The main sources of maintenance costs with software (and microservices) by @adriancolyer at #mucon: Unstable interface; Implicit cross-module dependency; Unhealthy interface inhertance hierarchy; Corss-module cycle; Cross-package cycle.

Nobody makes an embedded device like this by accident. If you wanted to create a world-wide network of hackable devices with real IP addresses then making really cheap consumer devices is a good plan. Hacked Cameras, DVRs Powered Today’s Massive Internet Outage: The issue with these particular devices is that a user cannot feasibly change this password,” Flashpoint’s Zach Wikholm told KrebsOnSecurity. “The password is hardcoded into the firmware, and the tools necessary to disable it are not present. Even worse, the web interface is not aware that these credentials even exist.

Need a sound? WORLD’S LARGEST NATURAL SOUND LIBRARY NOW ONLINE. Loons are cool.

A deep dive into Hardware Entropy Generators by John Sloan, who has a very special obsession. Talks about entropy and how the random number generator in the Linux kernel works. The Linux kernel is constantly consuming entropy and it turns out to be really difficult to generate unpredictably in a system designed to be predictable. So it's hard to generate secure encryption keys. See the problem? VMs have trouble generating entropy. Systems early in the boot cycle have little entropy. The solution is one of many different kinds of hardware entropy generators. Most are $50-$100. For $1200 you can get a nifty quantum entropy device that uses a beam splitter.

Like peanut butter and chocolate. Welcoming Adrian Cockcroft to the AWS Team.

This is what the future of failure looks like. It's complex. Envato Post Mortem report: 19 October 2016: This incident manifested as five “waves” of outages, each subsequent one occurring after we thought the problem had been fixed. In reality there were several problems occurring at the same time, as is usually the case in complex systems. There was not one single underlying cause, but rather a chain of events and circumstances that led to this incident.

Some Highlights from the O'Reilly Software Architecture Conference in London 2016.

When static doesn't mean simple. AWS Git-backed Static Website. It's great to see the whole process in one place, but that's a lot of moving parts. Too many? Or is this the essential complexity of the process laid bare?

A good set of MIT Open Courseware videos on AI.

Aeron continues to kick assymptopes. Benchmark Results: 24 October 2016: Aeron delivered over 19 million messages per second, completing the transfer 211 times faster than gRPC and 71 times faster than KryoNet. This is a material improvement since the prior benchmark report and is mostly due to buffer bounds checks being disabled.

Service discovery at Stripe. Nice description of how to use Consul for service discovery. Not a great out of the box experience, and it took a lot of work to integrate, but they seem happy with the result.

Interactive Analytics: Redshift vs Snowflake vs BigQuery. Periscope Data likes Redshift on both price and performance, but they give a very useful evaluation of Snowflake and BigQuery as well, including the complicated pricing models. BigQuery has fast query speeds even as they scale to very large datasets, easy enough to manage analysts can do it, reaching petabyte scale just works. With Snowflake you get competitively fast analytics performance out of the gate, without having to spend a lot of effort optimizing the system.

Serialization overhead strikes again. Tracing a Python performance bug on App Engine: "App Engine's older Python Datastore library is very slow when serializing big objects...Eventually we found a place where we could monkey patch their internal API to skip the conversion, and just return the "raw" data from the Datastore." Lessons: Per-request logs are a performance tool, but they don't need to be fancy; Don't layer abstractions that do the same thing; Reproduce hard bugs with the smallest amount of code possible; db deserialization is slower than ndb; db.Expando is twice as slow compared to db.Model, but for ndb it makes no difference; The Standard Environment has very fast protocol buffer serialization compared to the Flexible Environment.

Hacking the signal. CTCSS fingerprinting: a method for transmitter identification.

Good experience report by Evernote on Bringing Micro-Services to the Client Side: Project Ion and “Micro-Components”. On the client side they are moving from GWT to React, Redux, Webpack, and Babel to enhance our workflow. One reason given is they want to attract engineers by having a sexier stack. Who wants to work on GWT? This is a surprisingly common reason for switching stacks. The flower must attract the worker bee.

Lessons Learned From Working in the Clouds: Be Redundant; Keep Code Small and Simple; Plan For Failures; Use Cloud Services Where it Makes Sense; Test Yourself Before You Wreck Yourself. The motivating cautionary tale behind Be Redundant is sobering. It's not sufficient to backup to a different s3 bucket, backup to a different cloud. Why? "Code Spaces was a code-hosting service that was completely based in AWS. A hacker gained access to their AWS control panel and deleted all of their data and configurations. In a period of twelve hours the hacker deleted the company."

VMware Future Net videos are now available.

capitalone/cloud-custodian: a rules engine for AWS fleet management. It allows users to define policies to enable a well managed cloud infrastructure, that's both secure and cost optimized. It consolidates many of the adhoc scripts organizations have into a lightweight and flexible tool, with unified metrics and reporting.

Adiabatic Quantum Computing Conference 2016 videos are now probably available.

probprog/anglican: a probabilistic programming system implemented in Clojure, both the programming environment and the language. The doc does a bad job of explaining what it's good for, here's a good podcast with more info: ANGLICAN and Probabilistic Programming.

nmaggioni/gerph: A simple and blazing fast networked key-value configuration store written in Go.

Chain: Tasks and Channels for Reliable Intermittent Programs: Energy harvesting computers enable general-purpose computing using energy collected from their environment. Energy-autonomy of such devices has great potential, but their intermittent power supply poses a challenge. Intermittent program execution compromises progress and leaves state inconsistent. This work describes Chain: a new model for programming intermittent devices. "Chain asks an application developer to define a set of computational tasks that compute and exchange data through a novel way of manipulating the computer’s memory, called a channel. Chain guarantees that tasks execute correctly despite arbitrary power failures."

Transactions for Distributed Actors in the Cloud: We present a new transaction protocol that avoids this blocking by releasing all of a transaction’s locks during phase one of two-phase commit, and by tracking commit dependencies to implement cascading abort.

Minoca OS: a general purpose operating system written completely from the ground up. It’s intended for devices looking to conserve power, memory, and storage. It aims to be lean, maintainable, modular, and compatible with existing software.

Zooids: Building Blocks for Swarm User Interfaces: an open-source open-hardware platform for developing tabletop swarm interfaces. The platform consists of a collection of custom-designed wheeled micro robots each 2.6 cm in diameter, a radio base-station, a high-speed DLP structured light projector for optical tracking, and a software framework for application development and control.

Microsoft: A Cloud-Scale Acceleration Architecture: In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications. This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers...By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network. Additionally, the scale of direct inter-FPGA messaging is much larger. The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively.

An empirical study on the impact of C++ lambdas and programmer experience: Results afford some doubt that lambdas benefit developers and show evidence that students are negatively impacted in regard to how quickly they can write correct programs to a test specification and whether they can complete a task. Analysis from log data shows that participants spent more time with compiler errors, and have more errors, when using lambdas as compared to iterators, suggesting difficulty with the syntax chosen for C++.

Stuff The Internet Says On Scalability For November 11th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale