hot links

Stuff The Internet Says On Scalability For April 21st, 2017

High Scalability

21 Apr 2017 — 12 min read

Hey, it's HighScalability time:

Which do you see: Machines freeing people? Lost jobs? Slavery? Hyperactive Skittles?
If you like this sort of Stuff then please support me on Patreon.

year 1899: “Nobody has to use the Internet”; 12MPH: Speed news of Lincoln's assassination traveled the US; $200 million: Lyft tips; 500: data structures and algorithms interview questions; %0.00244140625: Odds of 13 straight male Dr. Who regens; 100: gigafactories could power the world; 100K: bots on Messenger; 1 million: containers Netflix lanched in one week; 5.2 trillion: 2014 US revenue; 52,129: iterations to converge on NFL schedule; 36 Gbps: Facebook's network in the sky;

Quotable Quotes:
- @mipsytipsy: "That doesn't sound hard. I could build that in a weekend."
- @Noahpinion: The Elon Musk Future is the good future. The Peter Thiel Future is the bad future. But honestly you'll probably get the Jeff Bezos Future.
- @BenedictEvans: In 2007 Google, Apple, Facebook & Amazon had maybe 50k staff between them. Today it's more like 400k.
- @AWSonAir: @Expedia inserting 70,000 rows per second of hotel data with Amazon Aurora.
- @swardley: STOP! If you're thinking of moving to cloud today (as in IaaS), you are so late that you need to consider moving to serverless ->
- David Rosenthal: Silicon Valley would not exist but for Ph.D.s leaving research to create products in industry.
- @cmeik: Distributed applications today treat the database like shared memory, and that's why we love things like Spanner. This is a flawed design.
- @Jason: Apple's cash hoard swells to five Teslas / four Ubers / 25 Twitters 😂 (aka $246.09 billion)
- Founder Collective: I don’t imagine these founders aspired to start businesses in video rental, orthodonture, or email deliverability as kids. The point is that passion can be found in a lot of places.
- @randomfrequency: OH: "We're doing soviet style agile - every two weeks there's a new five year plan" // Will Whittaker
- ragsoflight: It's worth pointing out that the combine is a notoriously bad predictor of how a prospect will perform in the NFL. It's not a stretch to posit the same thing about technical interviews that stress rote memorization and minutia.
- fbonetti: I'm currently working at a place that has virtually no process, which has it's own challenges, but I'm happy that I'm not arguing about burn down charts anymore :)
- ryg: The good news is absolute DRAM latencies have gone down since the 80s – by a factor of about 4-5 or so. The bad news is that clock rates have increased by about a factor of 3000
- @peterbourgon: "Programming is made up of 2 activities: making decisions (95%), and typing (5%)"
- Steven Levy: Cannabis is to indoor growing as porn was to the internet
- @codinghorror: for GPU hash cracking, one 1080 Ti is worth more than three (!) AWS G2.8xlarge instances ($2.60/hour)
- Phillip Manwaring: Because everything outside of [solving your business’ problems] — load balancing, capacity planning, deployments, five 9 availability engineering — is undifferentiated heavy lifting
- @EnterprisingA: Anyone who takes a good look at AWS Greengrass without going "holy sh-t" doesn't get it
- @philnash: I got a 200-300ms improvement on render time using rel="preload" for fonts on philna.sh after reading @addyosmani's
- uiri: The people whom cdixon talks to that say they want to be '"working at or founding a startup"' are lying. They are lying to themselves first and foremost and only secondarily to cdixon. Their actions speak much louder than those words. Those actions say that they want to safety and security of the job that they don't enjoy rather than the risk and uncertainty of a startup.
- crusso: The phrase "this is why we can't have nice things" surfaces time and again in the face of those 1 in 20 sociopaths who interact with you as a business owner.
- empressplay: A core fundamental here is that people inherently misjudge the competency of their peers / the state of other projects / components / departments. Thus when they have a "problem" they make an assumption that their single point of failure won't be catastrophic (because everything else is okay), and that there's no need to sound an alarm over it (and potentially jeopardise their career).
- incapsula: there’s been a major shift over time in the motivation of the people behind the DDoS attacks. Instead of simply trafficking in spam, botnet operators have figured out a way to monetize their efforts through extortion or by launching a DDoS-for-hire platform like Mirai.
- rdtsc: Don't handle errors locally. Build a supervision tree where some of part of the system does just the work it is are intended to (ex.: handling a client's request), and other (isolated part) does the monitoring and error handling. Have one process monitor others, one machine monitor another etc.

The private cloud takes a hit. Intel Pulls Out of OpenStack Effort It Founded with Rackspace.
- vishvananda, one of the original OpenStack developers, on the two mistakes of OpenStack: 1. We thought that private clouds were generally valuable. 2. We focused on community building by supporting all use cases. The end result of these mistakes OpenStack is good for certain use-cases. It does really well in large companies that need a public-cloud like environment to manage their infrastructure and can hire a team of people to manage it (e.g. comcast, verizon, e-bay, wal-mart). I don't know that the exodus is due to endemic problems so much as the market finally waking up and realizing that public cloud is the future.
- jldugger: IMO, OpenStack was sort of a consortium effort to compete with VMWare, AWS and Salesforce, but it's operational model is closest to AWS. Nearly all the big enterprise IT companies tied their rafts together hoping to stave the flow of customers. It doesn't appear to have worked.
- crispyambulance: [Facebook is] basically trying to create a new class of transceiver. It remains to be seen if this will take off or not, but since it is part of the OCP effort, the chances are good that it will be taken seriously by QSFP vendors.
- jasode: I believe the decline of OpenStack is inherent in the type of technology. I don't believe that type of "infrastructure plumbing" software lends itself to high-speed high-quality innovation in the "open source" model.
- Netflix: We run a peak of 500 r3.8xl instances in support of our batch users. That represents 16,000 cores of compute with 120 TB of memory.
- WorldMaker: So yes, the purpose of OpenStack and Container technologies are very different and I appreciate that technically. In terms of real world value to me as a software developer, however, I have platform problems not infrastructure problems. I don't care what the infrastructure is under the service so long as it provides a stable, reliable platform for me to build upon. Containers abstract that for me in a way that solves real platform problems that OpenStack was only ever relevant to me in so far as its ability to once hint at a possible solution to.

You can understand deep learning in two hours says Rodney Brooks if you watch these two videos by Patrick Winston: 12a Neural Nets, and 12b Deep Neural Nets.

Every year Reddit instead of April Fools' creates April Fun. See their latest episode in How We Built r/Place. Goal: 100,000 simultaneous users create a shared image by depositing tiles on a 1000 x 1000 tile grid. Storing the board in Cassandra with one row having 1 million columns was way too slow. They went with a clever two layer scheme. A representation of the board is encoded in bit fields and stored in Redis, cached in Fastly with a one second timeout. Each tile is stored in Cassandra to make it available if Redis failed. Websockets were used to push updates to all clients, transmitting over 4 gbps, 150 Mbps per instance and 24 instances. Also, people, put back-off in your retries! Result: striking. Good discussion, where else but on reddit.

We switched to Amazon ECS and you won’t believe what happened next: Lower AWS bills. By switching to ECS we cut our EC2 bill in half; Better security and credentials management; Consistency across teams and services; Separating private infrastructure from otherwise sharable code; Less environment confusion.

Alan Kay on why People who are really serious about software should make their own hardware: if one is making something that is supposed to be good for people to use — that actually might help them in important ways — then the design needs to be in terms of humans-with-processes, and shouldn’t be limited by the particular hardware (and programming languages and systems) that vendors might be supplying.

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow. If the significance of this doesn't resonate then take a look at Jeff Dean On Large-Scale Deep Learning At Google. It solves problems that can be framed as mapping one sequence to another, like language translation, which can then be composed with other techniques.

Are platforms the new company town? The new rentierism? You can always leave a platform...right? Isn't subscription just another word for rent? Interesting thought. See also Competition – Rethinking the Regulatory Framework.

A Comprehensive Guide To HTTP/2 Server Push. After a very good explanation it turns out push didn't make big a performance improvement. But Mark Mennell makes a good point: Server PUSH is only one feature of HTTP2. A bigger feature in my view is persistent connections, so at the TCP/IP layer the socket is kept open instead of having to reconnect each request – this has the biggest performance impact I understand

What's the difference between a message bus and a message queue? Bus or Queue. Not much anymore. Message queue: receives messages from an application and makes them available to one or more other applications in a first-in-first-out (FIFO) manner. Message bus: provides a way for one (or more) application to communicate messages to one or more other applications. Good discussion on StackOverflow.

Wait, there's such a thing as participatory democracy? Yes. In Taiwan. And it's tech mediated. Facebook, isn't this something you can do? vTaiwan: Public Participation Methods on the Cyberpunk Frontier of Democracy. Unlike older ossified democracies, Taiwan has only been a democracy for 30 years, hopefully they can kickstart the rest of us. vTaiwan is a four phase process with a set of methods that integrate technology, media, and facilitation: an artificial-intelligence facilitated conversation tool called pol.is is distributed through Facebook ads and stakeholder networks; a public meeting is broadcast where scholars and officials respond to issues that emerged in the conversation; an in-person stakeholder meeting co-facilitated by civil society and the government, and broadcast to remote participants; the Government agrees to bind its action to points that reached consensus, or provides a point-by-point explanation of why those consensus points are not (yet) feasible. This process was used to determine how Uber should enter Taiwan. It was used to overcame a six-year deadlock on online alcohol sales. Result: Taken as a whole, the process vTaiwan has created amounts to a rethinking of how citizens send signals on complex issues, and how government listens and decisions result. Consensus-building combined with facilitation to derive “coherent, blended volition."

Does a good BASS help quickly create complex apps? Seems so. The Buffer Retreat App Version 2: Migrating Tech Stacks, New Features and More! Nice work with Firebase with a Parse port thrown in.

How we fine-tuned HAProxy to achieve 2,000,000 concurrent SSL connections using only 48GB of RAM. Use nbproc to make HAProxy use multiple cores. Vegeta for load testing. Max open files set to 4 million. Max connections set to 2 million.

Michael Barker: We've found that as our exchange volumes have increased the only protocol capable of handling a full un-throttled feed is ITCH (over multicast UDP). For all of our other stream based TCP feeds (FIX, HTTP) we are moving toward rate throttling and coalescing events based on symbol in all cases - we already do it in the majority of our connections. We maintain a buffer per connection (Disruptor or coalescing ring buffer depending on the implementation) so that the rate at which a remote connection consumes does not impact on any of the other connections. With FIX we also maintain some code that if we detect a ring buffer becoming too full (e.g. >50%) then we pro-actively tear down that connection under the assumption that their connection is not fast enough to handle the full feed or it has disconnected and we didn't set get a FIN packet. If you have non-blocking I/O available, then you can be a little bit smarter regarding the implementation (unfortunately not an option with the standardised web socket APIs).

Great sumary. Key Takeaway Points and Lessons Learned from QCon London 2017.

Memory bandwidth: We get about 11.7 bytes/cycle per SM, so about 4x what the i7-7700K core gets; that sounds good, but each SM drives 128 “CUDA cores”, each corresponding to a thread in the SIMT programming model. Per thread, we get about 0.09 bytes of memory bandwidth per cycle – or perhaps less awkward at this scale, one byte every 11 instructions. That, in short, is why everything keeps getting more and larger caches, and why even desktop GPUs have quietly started using tile-based rendering approaches (or just announced so openly). Absolute memory bandwidths in consumer devices have gone up by several orders of magnitude from the ~1MB/s of early 80s home computers, but available compute resources have grown much faster still, and the only way to stop bumping into bandwidth limits all the time is to make sure your workloads have reasonable locality of reference so that the caches can do their job.

It might be kismet if you watch these videos from Facebook's 2017 F8 conference.

If you are experiencing lag over your wifi it might be because of buffer bloat. There are some modems that implement protocols to fix that. Too subtle to gloss, learn more at Home products that fix/mitigate bufferbloat…

Nirvana is about mastering passions, don't think that can be found in the cloud. A serverless nirvana? Microsoft Azure CTO Mark Russinovich on the future of the cloud.

Things are different now. If this could be done at all in the past it would have cost millions of dollars and take a very large team. Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning: In all, this entire round of researching, productionization, and refinement took about 8 months, at the end of which we had built and deployed a state-of-the-art OCR pipeline to millions of users using modern computer vision and deep neural network techniques. Our work also provides a solid foundation for future OCR-based products at Dropbox.

Are you in the mood for some Javascript positivity? JavaScript: What excites me in 2017: Reason is a transformed view of OCaml that makes it look a bit like ES2016 without JavaScript’s bad parts; Rust; Web Assembly; Web Assembly; GraphQL and Relay; Scale with Docker, Now.sh & GitHub Pages; Houdini; WebRTC; Privacy: Using IndexDB, Service Workers and WebRTC; Decentralized currencies;

When things work well it's because of good design. Twit 610. Delta uses mobile technology to avoid the United problem. Delta asks when you check-in how much money you would take to be bumped. So Delta can pick in least cost order which people to bump.

It's SOP to run live traffic through new code to test it. Netflix did it for six months! The Evolution of Container Usage at Netflix: Supporting customer facing services is not a challenge to be taken lightly. We’ve spent the last six months duplicating live traffic between virtual machines and containers. We used this duplicated traffic to learn how to operate the containers and validate our production readiness checklists. This diligence gave us the confidence to move forward making such a large change in our infrastructure.

Reclaiming JVM memory is part of your birthright as a container user. Java RAM Usage in Containers: Top 5 Tips Not to Lose Your Memory: A new JVM option (-XX:+UseCGroupMemoryLimitForHeap) automatically sets Xmx for a Java process according to memory limit defined in cgroup; Limiting the size of the metadata is important, specially if you are having OOM issues. Do this with the special option -XX:MaxMetaspaceSize;

Much harder said than done. In a very real sense the code you write is you because it embodies your thought, which is detached part of your essence. You Are Not The Code You Write: Criticism to your code is not criticism to you.

uber/jaeger (article): Uber's Distributed Tracing System.

NebulousLabs/Sia: Blockchain-based marketplace for file storage. A new decentralized cloud storage platform that radically alters the landscape of cloud storage. By leveraging smart contracts, client-side encryption, and sophisticated redundancy (via Reed-Solomon codes), Sia allows users to safely store their data with hosts that they do not know or trust.

ChrisRx/dungeonfs: A FUSE filesystem and dungeon crawling adventure game engine.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications: We [Google] present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy.

Competitive Programmer’s Handbook: The book is especially intended for students who want to learn algorithms and possibly participate in the International Olympiad in Informatics (IOI) or in the International Collegiate Programming Contest (ICPC). Of course, the book is also suitable for anybody else interested in competitive programming.

Trade-Offs Under Pressure: Heuristics and Observations Of Teams: This study explores what heuristics or rules-of-thumb engineers employ when faced with an outage or degradation scenario in a business-critical Internet service. A case study approach was used, focusing on an actual outage of functionality during a high period of buying activity on a popular online marketplace. Heuristics and other tacit knowledge were identified, and provide a promising avenue for both training and future interface design opportunities.

Stuff The Internet Says On Scalability For April 21st, 2017

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale