hot links

Stuff The Internet Says On Scalability For July 20th, 2018

High Scalability

20 Jul 2018 — 18 min read

Hey, it's HighScalability time:

World History Timeline from 3000BC to 2000AD. Yet we still program with text—in files (Schofield and Sims)

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

$150 billion: Bezos Prime; 49%: Amazon's share of US e-commerce; 1,000 terabytes: image size representing one cubic millimeter of brain tissue; 7x: 4 year reduction in cost of computing power; 25x: faster code using SIMD; 4TB: RAM in GCE “ultramem” instance type; 4 months: half-life of an ICO; 80%: cost savings moving from AWS to DO; 130,000: square feet in biggest vertical farm; 14x: price increase for Google Maps;

Quotable Quotes:
- Chappell: In [CPU] architectures, we believe that aggressive specialization is a part of the answer to what happens next. That’s mapping applications to the specific architectural choices. And you already see that in machine learning, where there’s a really hot field in terms of deep neural nets and other implementations. The third wing of the architecture piece is the “domain specific system-on-chip.”
- Troy Hunt: I love the way these two services work in unison: Azure Functions to ensure you can scale immediately without being bound by logical infrastructure, deployment slots that make it easy to test and rollover new releases with zero downtime, then of course Cloudflare Workers to give you heaps of control over traffic flow for testing and rollover purposes and all protected using their rate limit service.
- Cliff Click: The JVM is very good at eliminating the cost of code abstraction, but not the cost of data abstraction. That means multiple data indirections mean multiple cache misses. They are very expensive. This is where your performance goes.
- jaybo_nomad: The Allen Institute for Brain Science is in the process of imaging 1 cubic mm of mouse visual cortex using TEM at a resolution of 4nm per pixel. The goal is to complete this in about 4 months running in parallel on 5 scopes.
- @lowrykoz: Stolen from a co-worker "Every company has a test environment. Some are lucky enough to also have a production environment." Thanks Charlie for my laugh today!
- DonHopkins: NeWS differs from the current technology stack in that it was all coherently designed at once by James Gosling and David Rosenthal, by taking several steps back and thinking deeply about all the different problems it was trying to solve together. So it's focused and expressed in one single language, instead of the incoherent fragmented Tower of Babel of many other ad-hoc languages that we're stuck with today.
- tef: You can use a message broker to glue systems together, but never use one to cut systems apart.
- @CompSciFact: 'Fancy algorithms are slow when n is small, and n is usually small.' -- Rob Pike
- @Grady_Booch: I interviewed John Backus shortly before his death. He told me his work in functional programming languages failed, and would likely always fail, because it was easy to do hard things but incredibly difficult to do simple things.
- Mark LaPedus: MRAM, a next-generation memory type, is being touted as a replacement for embedded flash and cache applications. MRAM works in consumer applications, but it’s still unclear if it will ever meet the temperature requirements for automotive. Some say MRAM will never work in automotive.
- crabbone: This is the prism through which Java programmers view the world. They never question this belief. It also works well to justify an acquisition of more servers to investors. Investing tons of efforts into IT, building complicated deployment and clustering software etc. This is what happens in the company I work for, and couple of mid-size to big companies I worked for before. The truth about it is that Java only gets you a good bang for your buck just a wee bit before it hits OOM. It doesn't scale down to the point that is ridiculous and painful. A typical example of modern "microservices-inspired" Java application would function along these lines:
- Netflix: We observed during experimentation that RAM random read latencies were rarely higher than 1 microsecond whereas typical SSD random read speeds are between 100–500 microseconds. For EVCache our typical SLA (Service Level Agreement) is around 1 millisecond with a default timeout of 20 milliseconds while serving around 100K RPS. During our testing using the storage optimized EC2 instances (I3.2xlarge) we noticed that we were able to perform over 200K IOPS of 1K byte items thus meeting our throughput goals with latency rarely exceeding 1 millisecond. This meant that by using SSD (NVMe) we were able to meet our SLA and throughput requirements at a significantly lower cost.
- paulddraper: The biggest lesson HN teaches for designing large scale systems is "use a large scale system someone else has already designed".
- Peter Zaitsev: While “normal” buffered IO is indeed quite fast, this [SSD] drive really hates fsync() calls, with a single thread fsync() latency of 3.5ms or roughly 300 fsync/sec. That’s just slightly more than your old school spinning drive.
- J. Doyne Farmer: Economic failures cause us serious problems. We need to build simulations of the economy at a much more fine-grained level that take advantage of all the data that computer technologies and the Internet provide us with. We need new technologies of economic prediction that take advantage of the tools we have in the 21st century.
- Vadim Tkachenko: Intel Optane is a very capable drive, it is easily the fastest of those we’ve tested so far. MySQL 8 is not able to utilize all the power of Intel Optane, unless you use unsafe settings (which to me is the equivalent of driving 200 MPH on a highway without working brakes). Oracle has focused on addressing the wrong IO bottlenecks and has overlooked the real ones.
- Ted Kaminski: The biggest security innovation of the past 20 years has been creating more security layers. Hardware (like TPMs) can protect itself from software. The kernel can protect itself even from root (e.g. secure boot). Root can protect itself from users. Users can protect their own accounts from their own software with sandboxing. Reduce the value of a compromise. Increase the cost of an attack.
- Ann Steffora Mutschler: In experiments, the molecular clock averaged an error under 1 microsecond per hour, comparable to miniature atomic clocks and 10,000 times more stable than the crystal-oscillator clocks in smartphones, according to the team.
- djhworld: We used to run an architecture similar to this a few years ago, I work for a broadcaster and unfortunately it failed badly during a big event. The Kinesis stream was adequately scaled, but the poller between Kinesis -> Lambda just couldn't cope. This was discovered after lots of support calls with AWS. It might be better these days I don't know, we moved to using Apache Flink + Apache Beam, which has a lot more features and allows us to do stuff like grouping by a window, aggregation etc.
- mozumder: What kind of numbers are they talking about for it to be "large-scale"? One well designed fast app server can serve 1000 requests per second per processor core, and you might have 50 processor cores in a 2U rack, for 50,000 requests per second. For database access, you now have fast NVMe disks that can push 2 million IOPS to serve those 50,000 accesses. 50,000 requests per second is good enough for a million concurrent users, maybe 10-50 million users per day. If you have 50 million users per day, then you're already among the largest websites in the world. Do you really need this sort of architecture for your startup system?
- colmmacc: AWS engineer here, I was lead for Route 53. We generally use 60 second TTLs, and as low as 10 seconds is very common. There's a lot of myth out there about upstream DNS resolvers not honoring low TTLs, but we find that it's very reliable. We actually see faster convergence times with DNS failover than using BGP/IP Anycast. That's probably because DNS TTLs decrement concurrently on every resolver with the record, but BGP advertisements have to propagate serially network-by-network. The way DNS failover works is that the health checks are integrated directly with the Route 53 name servers. In fact every name server is checking the latest healthiness status every single time it gets a query. Those statuses are basically a bitset, being updated /all/ of the time. The system doesn't "care" or "know" how many health status change each time, it's not delta-based. That's made it very very reliable over the years. We use it ourselves for everything. Of course the downside of low TTLs is more queries, and we charge by the query unless you ALIAS to an ELB, S3, or CloudFront (then the cost of the queries is on us)
- tribune: As a member of the relatively hip 20-something social set in New York, I can say with confidence that Instagram is "where it's at" with this group. The introduction of stories and other Snapchat-like features has dramatically increased Instagram's usage among my friends (and definitely snagged plenty of app-hours from the former). Like the article says, Instagram is absolutely an escape from the hostile interfaces and hostile posts that abound on core Facebook. It's still cool and it's still fun. Whatever the internal power dynamics, it seems unlikely that Facebook would be stupid enough to throw this away. What do they care if their bread gets buttered through the Instagram app?

What's the biggest threat causing data loss? War? Earthquakes? Flood? No! A bad credit card. Backblaze Durability is 99.999999999% — And Why It Doesn’t Matter.

Chess players burn about 100 calories every 45 minutes. Walking a mile scorches through 100 calories. So you might reasonably conclude, in one day, programmers can burn 1000 calories—the equivalent of walking 10 miles—through the power of thought alone. That's your comeback when someone says “Bro, you don’t work hard. I just worked a 4700-hour week digging a tunnel under Mordor with a screwdriver.” Programming Sucks.

When things break we always get to learn a little more about how they work. Google Cloud Networking Incident #18012: Google’s Global Load Balancers are based on a two-tiered architecture of Google Front Ends (GFE). The first tier of GFEs answer requests as close to the user as possible to maximize performance during connection setup. These GFEs route requests to a second layer of GFEs located close to the service which the request makes use of. This type of architecture allows clients to have low latency connections anywhere in the world, while taking advantage of Google’s global network to serve requests to backends, regardless of in which region they are located...The GFE development team was in the process of adding features to GFE to improve security and performance. These features had been introduced into the second layer GFE code base but not yet put into service. One of the features contained a bug which would cause the GFE to restart; this bug had not been detected in either of testing and initial rollout.

A high-level overview. Public Cloud Services Comparison (June 25th,2018).

netdev day 2: moving away from "as fast as possible" in networking code: The main thing [Van Jacobson] said has changed about the internet is – it used to be that switches would always have faster network cards than servers on the internet. So the servers in the middle of the internet would be a lot faster than the clients, and it didn’t matter as much how fast clients sent packets...Here’s an idea that was really surprising to me – sending packets more slowly often actually results in better performance (even if you are the only one doing it)...So why is sending data more slowly better? Well, if you send data faster than the bottleneck for the link, what will happen is that all the packets will pile up in a queue somewhere, the queue will get full, and then the packets at the END of your stream will get dropped. And, like we just explained, the packets at the end of the stream are the worst packets to drop! So then you have all these timeouts, and sending your 10MB of data will take way longer than if you’d just sent your packets at the correct speed in the first place. Also, netdev day 1: IPsec!

If you don't think a lack of legal protections for privacy has consequences...A Chinese university suspended a student's enrolment because of his dad's bad social credit score.

If you ain't cheatin you ain't tryin. But I can't even imagine trying to debug a program hardened against cheating like this. Riot's Approach to Anti-Cheat: We don’t share the state of other players if it doesn’t need to be shared, so we can avoid common cheats like “map hacks” (revealing all players on the map). We let the server’s game simulation make the authoritative game decisions and generally don’t trust the information received from the client, which helps prevent common cheats like “god mode” and “disconnect hacks,” barring any overlooked exploits. Our network protocol has been obfuscated, and we change this obfuscation regularly so that making a network-level bot is much more difficult. We set out to make tampering with the League of Legends game client more difficult by encrypting the game code. While all of our code was now protected, the globals and class members - whose pointers and values resided in the .data section of the PE file and the heap, respectively - were not.

Bandwidth costs are where they get you. Benchmarking AWS, DigitalOcean, Linode, Packet and Vultr: The launch of Exocraft.io and the subsequent flood of traffic revealed the flaw of our reliance on Amazon Web Services -- bandwidth cost. Within a month of launching, our costs surged nearly 10x entirely due to bandwidth costs on our EC2 servers...The goal of these tests was to find a way to lower our costs without sacrificing too much in performance. DigitalOcean accomplished both of those and more by besting EC2 in every category but CPU (which was quite close as well). We've since moved Exocraft.io to DigitalOcean and have realized an over 80% reduction in cost, which will only grow as our bandwidth usage increases....Some will argue that the best price-to-performance possible is with dedicated servers. This certainly can't be denied, but that ignores the many benefits hosting in the cloud offers. We've written our backend and structured our infrastructure to take full advantage of the ease at which cloud servers can be scaled up and down (both vertically and horizontally).

IO bound? Parallel on the client works fine. CPU bound? Run on a server to parallelize CPU bound work. That's what Airbnb did, which means node.js could not be the complete answer. Operationalizing Node.js for Server Side Rendering: The solution is to use a buffering reverse proxy to handle communication with the clients. For this, we use nginx. Nginx reads the request from the client into a buffer and passes the full request to the node server only after it has been completely read. This transfer happens locally on the machine over loopback or unix domain sockets, which are faster and more reliable than communication between machines. With nginx handling reading the requests, we are able to achieve higher utilization of the node processes. We also use nginx to handle some requests without ever having to go to the Node.js processes.

There's already a complaint that Amazon learns what products to build next by watching what succeeds in AWS. Imagine what could be learned by embedding servers everywhere? New – EC2 Compute Instances for AWS Snowball Edge.

Snitch: Putting consistency back into S3. You might not realize S3 is eventually consistent and that can cause problems. Box: Both the Dedupe and the ETL process rely on list operation of S3 to find and move objects. In the face of eventual consistency, the list operation can leave behind objects which will then never be moved to the target bucket in S3 and hence be lost for ever. Similarly ETL can fail to process and move data to RedShift if list operations are not strongly consistent. Given the rate at which we write to S3, eventual consistency is not just a theoretical phenomenon. We observe it every day.

This hurts. My probot.us never took off at all. So much work down the drain. People don't seem to like asking questions via chatbots. Chatbots were the next big thing: what happened? The age-old hype cycle unfolded in familiar fashion…Reverential TechCrunch articles were written. Prophetic thought leaders like Chris Messina chimed in. Silicon Valley salivated at the prospect of talking to smart automation. Messenger began to overflow with bots. Slack went through exponential growth and even launched a fund for bot investment.

Good explanation of an actor system. Scalability at BUX. Using an actor system to go from 100 users to 1.5 million.

Can the private cloud be a thing if Intuit is willing to lose $75 million to $85 million to get out of the running their own datacenter business? Intuit sells its largest data center amid move to AWS: Intuit is among the higher profile software vendors moving to Amazon Web Services. Intuit noted that AWS was able to handle peak tax season traffic and that gave it confidence to get out of the data center business.

Most methods of improving performance involve some sort of sharing. It's ironic that sharing, considered a social good—is an attack vector. Shared state has always been a bug vector. The solution? Become anti-social. Don't share. Isolate. Separate. Mitigating Spectre with Site Isolation in Chrome. Speculative Buffer Overflows: Attacks and Defenses. Performance decreases, certainly, but we lose a little bit of innocence too.

Databases are the hammer of the programming world. Building Distributed Locks with the DynamoDB Lock Client: DynamoDB supports mechanisms, like conditional writes, that are necessary for distributed locks. However, the AWS SDK doesn’t include the logic needed to actually implement distributed locks. The DynamoDB Lock Client wraps up the necessary client logic for distributed advisory locks in an easy-to-use client interface...In AWS, the many moving parts of Amazon EC2 also need to agree on what their configuration should be so that they can survive many different failure modes...Systems like Raft and Paxos were designed to address these challenges, but they are notoriously difficult to implement and operate. What if you just want to easily implement a lock for a simple distributed application you’re writing?...The tables are independent, so you can’t just wrap the changes you need in a relational transaction. Instead, you can lock the customer’s unique identifier at a high level. Alternatively, you can lock the unique identifiers of customer details (addresses and telephone numbers) at a finer-grained level. You’d do so with a locking API action for a certain duration in your application before making any changes.

There's an eternal push between centralization and decentralization. As soon as people realize the advantages of decentralizing through microservices they want to recentralize just a little bit to reduce some of the chaos. Good summary of Designing Microservice Architectures the Right Way: Michael Bryzek's Lessons Learned at QCon NY. Key takeaways included: engineers should design schema first for all APIs and events, as this allows the automated code generation of boilerplate code; event streams should be subscribed to in preference to directly calling APIs; investment should be made in automation, such as deployment and dependency management; and the engineering teams must focus on writing simple and effective tests, as this drives quality, streamlines maintenance, and enables continuous delivery...The microservice architecture is a popular modern style of implementing systems, but it needs to be designed the correct way; engineers should strive not to design "spaghetti" systems, and instead aim for a "layered" approach.

Another example of he who controls the interpreter controls the world. DINO and IINO: I've been writing for more than 4 years that, at scale, blockchains are DINO (Decentralized In Name Only) because irresistible economies of scale drive centralization. Now, a heist illuminates that, in practice, "smart contracts" such as those on the Ethereum blockchain (which is DINO) are also IINO (Immutable In Name Only)...Wait, "upgradeable" smart contracts on the blockchain? I thought the whole point of blockchains were that they were immutable! Smart contracts were "the steadfast iron will of unstoppable code". What did I know?

Interesting mix of answers. If you've moved your API from REST to GraphQL, were you happy with the outcome? ilovecaching: Absolutely. Before GraphQL we were making a monumental effort to build a REST API. After deliberating on exactly what REST was and how we’d represent a few red haired resources, we were spending a lot of client time fetching deep trees through resource links. When we moved to GraphQL it solved a lot of the administrative and philosophical headaches and considerably reduced the number of connections, wasted data, and made our client code so much simpler through easily grokked queries. heavenlyhash: Yes. (Mostly.) REST semantics are a distraction. xentronium: Our front end engineers were extremely happy. Me personally, as a backend engineer, not as much, but it isn't too bad. smt88: For a mature product, no. We closely looked at engineering time and realized we'd never make our investment back. For a new product, yes (so far).

Not sure studying for a design interview is the best motivation, but the content is useful. Learn how to design large-scale systems. cjhanks: The architecture is in general 'fine'. But communication paths of subsystems is probably the easiest part of the problem. And in general, re-organizing the architecture of a system is usually possible - if and only if - the underlying data model is sane. The more important questions are; - What is the convention for addressing assets and entities? Is it consistent and useful for informing both security or data routing? - What is the security policy for any specific entity in your system? How can it be modified? How long does it take to propagate that change? How centralized is the authentication? - How can information created from failed events be properly garbage collected? - How can you independently audit consistency between all independent subsystems? - If a piece of "data" is found, how complex is it to find the origin of this data? - What is the policy/system for enforcing subsystems have a very narrow capability to mutate information?

Azure Reserved VM Instances: significantly reduce costs up to 72 percent compared to pay-as-you-go prices with one-year or three-year terms on Windows and Linux virtual machines (VMs). When you combine the cost savings gained from Azure RIs with the added value of the Azure Hybrid Benefit, you can save even more.

What is the difference between AI, machine learning, and deep learning? There are many paths to AI. AI is a result, not a technology. Evolutionary algorithm outperforms deep-learning machines at video games: evolutionary computing can match the performance of deep-learning machines at the emblematic task that first powered them to fame in 2013—the ability to outperform humans at arcade video games such as Pong, Breakout, and Space Invaders. The work suggests that evolutionary computing should be feted just as widely as its deep-learning-based relations. Evolutionary computing works in an entirely different way than neural networks. The goal is to create computer code that solves a specific problem using an approach that is somewhat counterintuitive.

When Should I Use Amazon Aurora and When Should I use RDS MySQL? If you are looking for a native HA solution then you should use Aurora. For a read-intensive workload within an HA environment, Aurora is a perfect match. Combined with ProxySQL for RDS you can get a high flexibility. Aurora performance is great but is not as much as expected for write-intensive workloads when secondary indexes exist. In any case, you should benchmark both RDS MySQL and Aurora before taking the decision to migrate. Performance depends much on workload and schema design. By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades. If you need to use MySQL plugins you should use RDS MySQL. Aurora only supports InnoDB. If you need other engines i.e. MyISAM, RDS MySQL is the only option. With RDS MySQL you can use specific MySQL releases. Aurora is not included in the AWS free-tier and costs a bit more than RDS MySQL. If you only need a managed solution to deploy services in a less expensive way and out of the box availability is not your main concern, RDS MySQL is what you need.

No one person's physical shopping experience will need to be like anyone else's anymore. Amazon, Alibaba and Nike all point to the next innovation in retail: Personalized physical spaces: First, it hints at the Holy Trinity of next generation retail—cloud commerce, mobile applications and location analytics—that, when fused together, give retailers an unprecedented level of data capture within the retail value chain. Physical spaces, whether stores or other purposed entities, will become like their digital brethren, understood and analyzed with the same funnel-like metrics and A/B tests that give rise to the personalized experiences consumers encounter online today. Second, it highlights the importance not just of digital in transforming retail, but also of holistic experience design that takes the best of digital but also blends the most important aspects of the physical and the human in order to combat the universal points of friction within today’s shopping experiences. Third, via mobile technology, and likely voice and messaging down the road too, these recent innovations also augur a world in which retailers will have a one-to-one personal connection and platform for communication with their consumers, regardless of whether the ultimate commercial medium is digital or physical. Also, Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates.

If only Flutter supported the web. 500,000+ users from React Native to Flutter: After experimenting with Flutter for a little while I and the team fell in love with the cross-platform consistency, near-instant stateful hot reloading, great tooling and high performance of the platform. Upon deeper inspection of Flutter’s source code, I was delighted to find easy readable and well-documented code...Flutter apps are built using Dart. Like Java but unlike Javascript, Dart 2 is strongly typed and object-oriented. Unlike Java, Dart is easier on the eyes and feels more lightweight. Coming from JavaScript, it does take time to get used to the class hierarchies and static typing in Dart. That being said, I feel that the additional effort is well worth it in the long run — maintaining and refactoring our codebase has never been easier and safer...Over the course of the past few months, we have received widespread acclaim for Reflectly’s newfound slickness and design which we do to a great extent contribute to Flutter. Our development process is no longer fragmented by platform and we have moved from biweekly releases to weekly releases.

Great explanation and source code. Basics of Futexes: The futex proposal is based on a clever observation: in most cases, locks are actually not contended. If a thread comes upon a free lock, locking it can be cheap because most likely no other thread is trying to lock it at the exact same time. So we can get by without a system call, attempting much cheaper atomic operations first [2]. There's a very high chance that the atomic instruction will succeed.

Netflix-Skunkworks/diffy: Diffy allows a forensic investigator to quickly scope a compromise across cloud instances during an incident, and triage those instances for followup actions. Diffy is currently focused on Linux instances running within Amazon Web Services (AWS), but owing to our plugin structure, could support multiple platforms and cloud providers. It's called "Diffy" because it helps a human investigator to identify the differences between instances, and because Alex pointed out that "The Difforensicator" was unnecessarily tricky.

Stuff The Internet Says On Scalability For July 20th, 2018

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale