hot links

Stuff The Internet Says On Scalability For November 25th, 2016

High Scalability

25 Nov 2016 — 10 min read

Hey, it's HighScalability time:

Margaret Hamilton was honored with the Presidential Medal of Freedom for writing Apollo guidance software. Oddly, she's absent from best programmers of all time lists.If you like this sort of Stuff then please support me on Patreon.

98 seconds: before camera infected with malware; zeptosecond: smallest fragment of time ever measured; 50%: Google Cloud cheaper than AWS; 50%: of the world is on-line;

Quotable Quotes:
- @skamille: Sometimes I think that human societies just weren't meant to scale to billions of people sharing arbitrary information
- @joshk0: At @GetArbor we use #kubernetes to host a 30K QPS ad-tech serving platform. Maybe smaller than Pokemon Go but nothing to sneeze at.
- HFT Guy: 2016 should be remembered as the year Google became a better choice than AWS. If 50% cheaper is not a solid argument, I don’t know what is.
- Glenn Marcus: Hybrid [Progressive Web App] development takes 260% more effort man hours than Native development.
- Bruce Schneier: I want to suggest another way of thinking about it in that everything is now a computer: This is not a phone. It’s a computer that makes phone calls. A refrigerator is a computer that keeps things cold. ATM machine is a computer with money inside. Your car is not a mechanical device with a computer. It’s a computer with four wheels and an engine… And this is the Internet of Things, and this is what caused the DDoS attack we’re talking about.
- Bruce Schneier: I don’t like this. I like the world where the internet can do whatever it wants, whenever it wants, at all times. It’s fun. This is a fun device. But I’m not sure we can do that anymore.
- southpolesteve: [Lambda] is cheaper and simpler to operate than our previous ec2+Opsworks setup. We get code to production faster and spend more time on actual business problems vs infrastructure problems.
- Carlo Rovelli: Meaning = Information + Evolution
- chadscira: We have been using Rancher as well... It allowed us to move away from DO and AWS. Now most of our infra is from OVH :). It's been smooth sailing. Because of massive costs savings we were able to just reinvest it in our own redundancy. Also 12-factor apps are pretty damn resilient.
- Fiahil: Making separate [Google] accounts might not be enough considering they allegedly banned accounts related to each others by recovery address. Why would you think they would not do the same with accounts sharing occasionally the same laptop, the same ip address, and the same first and last name ?
- @swardley: Arghhh, one of those "can IBM beat Amazon?" .... the answer has three parts 1) the game has become harder 2) yes it could 3) no it won't
- fest: Replaying the sensor inputs and evaluating new estimated state is a really good way of debugging failures (because you can't just stop the system mid-air and evaluate internal state). It also helps with regression test suite and trying out new algorithms quickly.
- @Tibocut: «Institutions prefer to have trillions sitting still than redistributing them towards opportunities» @asymco https://youtu.be/nD8QszyiVTY at 2h45
- @AlanaMassey: A gathering of two or more average looking white men is referred to by biologists as "a podcast."
- @RyanHoliday: "How slow men are in matters when they believe they have time and how swift they are when necessity drives them to it." Machiavelli
- agataygurturk: We use route53 health checks to invoke API gateway and thus the backend Lambda.
- Paul Biggar: Yeah, BDSM. It’s San Francisco. Everyone’s into distributed systems and BDSM.
- @mims: Since the Apollo program, we've privatized the R&D that drives all innovation. That might be a problem.
- Backblaze: We have fewer drives because over the last quarter we swapped out more than 3,500 2 terabyte (TB) HGST and WDC hard drives for 2,400 8 TB Seagate drives. So we have fewer drives, but more data.
- @lee_newcombe: Fun finding from my talk earlier. 40 attendees: 37 on cloud, 3 about to start. Only one trying serverless. There's your opportunity folks
- Resilience Thinking: In resilient systems everything is not necessarily connected to everything else. Overconnected systems are susceptible to shocks and they are rapidly transmitted through the system. A resilient system opposes such a trend; it would maintain or create a degree of modularity.

Security expert Rob Graham with a stunning blow by blow twitter story of a botnet infecting his brand new security camera. The whole process starts within 98 seconds of putting the camera on the internet, which is far faster than an ordinary mortal can configure the device to be secure. This was a cheap camera that had good reviews. At some point we need to think about all this too cheap equipment as being funded by a Botnet Subsidy. It's almost too much of a coincidence that all these cheap devices, meant to be bought like candy in the mass consumer market, have such obviously poor security. Maybe it's not an accident? See also, Pre-installed Backdoor On 700 Million Android.

Their profit margin is your opportunity. With The Era of Cloud Price Discounts Is Fading and the cost of metal continuing to decrease, is now a good time to consider transitioning to bare metal on-premise type infrastructures? The incentives are now coming into alignment. Kubernetes: Finally...A True Cloud Platform by Sam Ghods, Co-founder, Box makes a good case for Kubernetes as the only truly portable infrastructure option.

This is both pure genius and a sure sign of the apocalypse. Exclusive Interview: How Jared Kushner Won Trump The White House. Democrats may have thought they had a technological lead because of the last presidential election, but it turns out they were fighting the last war. Technology changed and they did not. Old: targeting, organizing and motivating voters. New: Moneyball meets Social Media with a twist of message tailoring, sentiment manipulation and machine learning. If this presidential election could be represented as a battle between Peter Thiel and Eric Schmidt: Thiel triumphed. Traditional microtargeting is almost quaint. Now, using Facebook's ability to target users with dark posts, a newsfeed message seen by no one aside from the users being targeted, each user can be shown a world specifically tailored to push and prod their particular buttons. For an explanation see The Secret Agenda of a Facebook Quiz. That's why it's both genius and apocalyptical. Things will never be the same.

I have big fundamental problems with pricing structures that incentivize you to log less to save money. Slack explains their alerting philosophy. Syscall Auditing at Scale. Alerts are sent to an Elasticsearch cluster and ElastAlert is used to continuously query incoming data for alert generation and general monitoring.

IoT may be the regulation of us all. Security expert (the real deal) Bruce Schneier says The internet era of fun and games is over in a chilling talk he gave in front of the House of Representatives’ Energy & Commerce Committee. What has changed? Attack is easier than defense; There are new vulnerabilities in the interconnections; The internet empowers attackers; The economics don’t trickle down.

Very sad. A Crazy Miscalculation Doomed the Schiaparelli Lander. The crash was thought to be caused by bad sensor data that was fed into software that wasn't built to handle it. Was this a preventable design error? Should it have been more fault tolerant? Why does this keep happening? Good discussion on Hacker News, especially on the complexity of "sensor fusion" problems and the difficulty of knowing which sensors to trust. The most interesting issue is the potential role of outsourcing work to multiple parties so nobody had ultimate knowledge or responsibility for the working of the entire system.

Once apps were all the rage, now they just generate developer rage. Are we back to the future? Why Native Apps Really are Doomed: Native Apps are Doomed pt 2: Alibaba is the global leader in B2B trade. Recently, they upgraded to a PWA (Progressive Web Apps): 76% more web conversions; 30% more monthly active users on Android, 14% more on iOS; 4X higher interaction rate from Add to Homescreen.

Adrian Cockcroft on Cloud Trends — Where have we come from and where are we headed: With much faster hardware and more efficient messaging formats, we have low latency and high messaging rates. This makes it practical to compose applications of many simple single function microservices, independently developed and continuously deployed by cloud native automation; The most powerful p2.16xlarge instance type has raw performance of 70 Teraflops from about 40,000 GPU cores. And a little utopia: Unemployed workers can retrain on the latest technologies, and build up a reputation by entering coding contests and contributing to open source projects, as a pathway to new opportunities.

Very funny. And truish. Which makes it sad too. It's The Future. So I just need to split my simple CRUD app into 12 microservices, each with their own APIs which call each others’ APIs but handle failure resiliently, put them into Docker containers, launch a fleet of 8 machines which are Docker hosts running CoreOS, “orchestrate” them using a small Kubernetes cluster running etcd, figure out the “open questions” of networking and storage, and then I continuously deliver multiple redundant copies of each microservice to my fleet. Is that it? I’m going back to Heroku. See also, $15 Production Kubernetes Cluster on DigitalOcean

Can cube-to-cube fighting with the Golang team be far behind? Dart Developer Summit 2016 Videos: Dart is the fastest growing language at Google. Teams switching to Dart report up to twice the productivity and development speed of what they had previously.

Swardley on Why the fuss about serverless? Part history, part intro to Swardley maps, part business strategy primer. Takeaways: Serverless will fundamentally change how we build business around technology and how you code; Containers are important but ultimately invisible subsystems and this is not where you should be focused. Disagree on DevOps. DevOps under serverless is different, but no less visible to the practical art of coding.

How do you deal with webhook failures? This is a variant of the more general event dropped problem. brandur talks about how Stripe does it, which is sort how network managements systems do it: Stripe has an "events" API that can be polled to receive the same content that you would have received via Webhook. If you missed some Webhooks due to an application failure, it's possible to page through it and look for omissions. I've spoken to at least one person integrating who had this sort of setup running as a regular process to protect against the possibility of dropped Webhooks. This usually works pretty well, but does start to break down at very large scale where events are being created faster than you can page back.

Kubernetes for everyone. Craig McLuckie and Joe Beda at KubeCon. An interview about Heptio, a new startup following the well trod Open Source pattern of taking a technology you helped build at one company and turning it into a standalone company. This time it's Kubernetes. The idea is to lower the barrier of entry to Kubernetes. Make it easier to use, not just at the out-of the-box experience level, but deeper, as a bridge between PaaS and IaaS. I don't know much about Craig, but Joe Beda has been a consistent source of light and wisdom on the interwebs, so best of luck! It's a good interview too. Lots of Kubernetes origin story. They talk about operation specialization, taking roles that were conflated together and breaking them out. There are three different roles. Cluster Operator: they get the cluster up and running and mange the cluster. Most of the work has been at this level. More effort needs to be targeted at Application Developers / Application Operations, the people who actually use the cluster. That's where they will put their focus, making it easier for enterprise developers. The monetization strategy seems to be still up in the air. Or at least that's what they are saying :-)

So true. Animats: "We [CockroachDB] accomplished this by splitting our master branch into two branches: the master branch would be dedicated to stability, freezing with the exception of pull requests targeting stability. All other development would continue in a develop branch." There was a time when most software was developed that way.

Here are a couple free books that might be of interest. Baron Schwartz has written another in his excellent series of free books. His newest is Estimating CPU Per Query With Weighted Linear Regression, which explains how you can compute CPU per query. There's also an Introduction to Apache Flink, by Ellen Friedman and Kostas Tzoumas, which looks to be a good introduction to the topic of streaming as well. Here are some free Computer Science video courses too.

RAD -> AMP -> RISE (Real-time, Intelligent, and Secure Systems). Building the next-generation big data analytics stack. Berkeley's AMP lab has been wildly successful in reaching its research goals, producing the popular Spark, a fast and general engine for large-scale data processing. It's stunning what free graduate student labor can produce. What you may not know is each of these research projects are on a five-year tour. After they that disband, a new research direction is established, and a new project is formed. Interesing observation: AMPLab in some sense was a flip of that relationship. If you considered RAD Lab as basically a setting where “machine learning people were consulting for the systems people”, in AMPLab, we did the opposite—machine learning people got help from the systems people in how to make these things scale.

When Linode bundles network bandwidth into their pricing it sure does look cheap. Pricing Comparison: Cloud Hosts. Though if you are using a CDN the savings may not be as attractive.

slackhq/go-audit: an alternative to the auditd daemon that ships with many distros. After having created an auditd audisp plugin to convert audit logs to json, I became interested in creating a replacement for the existing daemon.

CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels: We have successfully developed a practical concurrent OS kernel and verified its (contextual) functional correctness in Coq. Our certified kernel is written in 6500 lines ofC and x86 assembly and runs on stock x86 multicore machines. To our knowledge, this is the first proof of functional correctness of a complete, general-purpose concurrent OS kernel with fine-grained locking.

Amazon has an updated AWS Well-Architected Framework 71 page paper, outlining high level cloud best practices. The audience target is CTOs so it's not a practical document. General design principles: Stop guessing your capacity needs; Test systems at production scale; Automate to make architectural experimentation easier; Allow for evolutionary architectures; Data-Driven architectures; Improve through game days. It then goes into a lot more detail on the five pillars: Security, Reliability, Performance Efficiency, Cost Optimization, Operational Excellence.

imatge-upc/detection-2016-nipsws: We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them.

Stuff The Internet Says On Scalability For November 25th, 2016

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale