hot links

Stuff The Internet Says On Scalability For April 13th, 2018

High Scalability

13 Apr 2018 — 24 min read

Hey, it's HighScalability time:

Bathroom tile? Grandma's needlepoint? Nope. It's a diagram of the dark web. Looks surprisingly like a tumor.

If you like this sort of Stuff then please support me on Patreon. And I'd appreciate if you would recommend my new book—Explain the Cloud Like I'm 10—to anyone who needs to understand the cloud (who doesn't?). I think they'll learn a lot, even if they're already familiar with the basics.

$23 billion: Amazon spend on R&D in 2017; $0.04: cost to unhash your email address; $35: build your own LIDAR; 66%: links to popular sites on Twitter come from bots; 60.73%: companies report JavaScript as primary language; 11,000+: object dataset provide real objects with associated depth information; 150 years: age of the idea of privacy; 30%~ AV1's better video compression; 100s of years: rare-earth materials found underneath Japanese waters; 67%: better image compression using Generative Adversarial Networks; 1000 bit/sec: data exfiltrated from air-gapped computers through power lines using conducted emissions;

Quotable Quotes:
- @Susan_Hennessey: Less than two months ago, Apple announced its decision to move mainland Chinese iCloud data to state-run servers.
- @PaulTassi: Ninja's New 'Fortnite' Twitch Records: 5 Million Followers, 250,000 Subs, $875,000+ A Month via @forbes
- @iamtrask: Anonymous Proof-of-Stake and Anonymous, Decentralized Betting markets are fundamentally rule by the rich. If you can write a big enough check, you can cause anything to happen. I fundamentally disagree that these mechanisms create fair and transparent markets.
- David Rosenthal: The redundancy needed for protection is frequently less than the natural redundancy in the uncompressed file. The major threat to stored data is economic, so compressing files before erasure coding them for storage will typically reduce cost and thus enhance data survivability.
- @mjpt777: The more I program with threads the more I come to realise they are a tool of last resort.
- JPEG XS~ For the first time in the history of image coding, we are compressing less in order to better preserve quality, and we are making the process faster while using less energy. Expected to be useful for virtual reality, augmented reality, space imagery, self-driving cars, and professional movie editing.
- Martin Thompson: 5+ years ago it was pretty common for folks to modify the Linux kernel or run cut down OS implementations when pushing the edge of HFT. These days the really fast stuff is all in FPGAs in the switches. However there is still work done on isolating threads to their own exclusive cores. This is often done by exchanges or those who want good predictable performance but not necessarily be the best. A simple way I have to look at it. You are either predator or prey. If predator then you are mostly likely on FPGAs and doing some pretty advanced stuff. If prey then you don't want to be at the back of the herd where you get picked off. For the avoidance of doubt if you are not sure if you are prey or predator then you are prey. ;-)
- Brian Granatir: serverless now makes event-driven architecture and microservices not only a reality, but almost a necessity. Viewing your system as a series of events will allow for resilient design and efficient expansion. DevOps is dead. Serverless systems (with proper non-destructive, deterministic data management and testing) means that we’re just developers again! No calls at 2am because some server got stuck?
- @chrismunns: I think almost 90% of the best practices of #serverless are general development best practices. be good at DevOps in general and you'll be good at serverless with just a bit of effort
- David Gerard: Bitcoin has failed every aspiration that Satoshi Nakamoto had for it.
- @joshelman: Fortnite is a giant hit. Will be bigger than most all movies this year.
- @swardley: To put it mildly, the reduction in obscurity of cost through serverless will change the way we develop, build, refactor, invest, monitor, operate, organise & commercialise almost everything. Micro services is a storm in a tea cup compared to this category 5.
- James Clear: The 1 Percent Rule is not merely a reference to the fact that small differences accumulate into significant advantages, but also to the idea that those who are one percent better rule their respective fields and industries. Thus, the process of accumulative advantage is the hidden engine that drives the 80/20 Rule.
- Ólafur Arnalds: MIDI is the greatest form of art.
- Abraham Lincoln: Give me six hours to chop down a tree and I will spend the first four sharpening the axe.
- @RichardWarburto: Pretty interesting that async/await is listed as essentially a sequential programming paradigm.
- @PatrickMcFadin: "Most everyone doing something at scale is probably using #cassandra" Oh. Except for @EpicGames and @FortniteGame They went with MongoDB.
- Meetup: In the CloudWatch screenshot above, you can see what happened. DynamoDB (the graph on the top) happily handled 20 million writes per hour, but our error rate on Lambda (the red line in the graph on the bottom) was spiking as soon as we went above 1 million/hour invocations, and we were not being throttled. Looking at the logs, we quickly understood what was happening. We were overwhelming the S3 bucket with PUT requests
- Sarah Zhang: By looking at the polarization pattern in water and the exact time and date a reading was taken, Gruev realized they could estimate their location in the world. Could marine animals be using these polarization patterns to navigate through the ocean?
- Vinod Khosla: I have gone through an exercise of trying to just see if I could find a large innovation coming out of big companies in the last twenty five years, a major innovation (there’s plenty of minor innovations, incremental innovations that come out of big companies), but I couldn’t find one in the last twenty five years.
- BBC: "He didn't think the police would be able to catch him from a crowd of 60,000 so quickly," Mr Li, from Honggutan police station in Nanchang city, added. Mr Li also told China Daily that there were several cameras at the ticket entrances equipped with facial recognition technology. An estimated 170 million CCTV cameras are already in place [in China] and some 400 million new ones are expected be installed in the next three years.
- Alexis Madrigal: But a core problem in knitting the neural-network designs is that there was no actual intent behind the instructions. And that intent is a major part of how knitters come to understand a given pattern.
- @QuinnyPig: I live in fear that @GCPcloud will announce something I love that they'll kill two years later. I live in fear that @awscloud will announce something lackluster that my grandchildren will be able to access. On balance, I vastly prefer the latter.
- Frank Schmid: Of course, adopting serverless architectures is not an ad-hoc thing. It requires a change in people's minds, which can be a very slow process; you might even have to wait for a new generation of software engineers to occupy senior-level positions.
- Vinton Cerf: Privacy may actually be an anomaly.
- @omphe: It had not occurred to me that microservices (or component decoupling in general) is an effective strategy for isolating regulatory compliance. e.g. a highly regulated service with slower release cycles won't hold up wider progress. Bi-modal IT, but distributed.
- @jtopper: We're working with designs currently where we can go one dimension further than that: using container identity to isolate access to per-tenant secrets in a multi tenant environment.
- @swardley: "for any digital transformation project to be successful, organisations can ill afford to consider any piece of the technology puzzle in isolation" - this is so wrong, you absolutely must view it as a component and after this the interactions with other components ...
- Anouska: But our blogs are OURS. We have spent years building our brands from the ground up, and it all started with our blogs. It is our very own personal space online. We control it. We decide what content goes up. We decide if we want to make changes to our website. There are no algorithms. There is no f*ckery. So I think it’s time to bring the blogs BACK.
- Alex Casalboni: The biggest [serverless] challenge I see is also the consequence of the perceived velocity of serverless, especially when you get started. You may end up thinking that you can simply ignore all the pieces and best practices that make serverless possible. For example, most developers don't invest enough time in learning Amazon CloudFormation and AWS IAM. These services are complex, and it may seem like a lot of useless work, but it'll actually save you a lot of troubles during debugging and also make your Functions safer and more robust.
- @zemlyansky: The compressed image looks awesome! The problem is it's not really compression of the original, but rather a realistic hallucination based on it. For example, your algorithm removes the car behind the bus. WebP and BPG versions contain more info than the GAN picture
- Gordon Bell: The cheapest, fastest and most reliable components of a computer system are those that aren't there.
- Daniel Lemire: Thus Intel processors have an easier time avoiding cache misses when the data loads are batched.
- @ScottMAustin: Asian investors directed nearly as much money into startups last year as U.S. VCs did—40% of the record $154B in global financing versus 44%.
- TED: L16, has 16 individual cameras that simultaneously capture the scene, and Laroia says its “real magic” are its sophisticated computational and machine learning algorithms, which fuse all of the images together into a single 52-megapixel photo. Thanks to its software and multiple lenses, the camera can deliver photos that are three-dimensional in their depth perception.
- Quirky: Outsiders are important to innovation; they often operate in fields where they are highly motivated to solve problems in which they are personally invested. They often look at problems in different ways from those who are well indoctrinated in the field, and they may question (or ignore) assumptions that specialists take for granted.
- Jaron Lanier: We cannot have a society in which, if two people wish to communicate, the only way that can happen is if it’s financed by a third person who wishes to manipulate them
- rspeer: Most technologies that were specific to the "Semantic Web", such as OWL and SPARQL, failed to scale and failed to solve realistic problems, and therefore died. (I always maintained that running a SPARQL endpoint amounted to running a DDoS on yourself.) However, we got something kind of cool out of the RDF model that underlies it, especially when some sufficiently opinionated developers identified the bad parts of RDF and dumped them. We got JSON-LD [1], a way for making APIs describe themselves in a way that's compatible with RDF's data model. For what I mean about sufficiently opinionated developers, I recommend reading Manu Sporny's "JSON-LD and Why I Hate the Semantic Web" [2], a wonderful title for the article behind the main reason the Semantic Web is still relevant.
- jsjaspreet: At my current workplace we're moving to GraphQL slowly and use it as a Gateway to front our backend REST APIs. Backend feature teams own their APIs, but are all consumed by one GraphQL API which is the nexus of backend communication for things like periodic tasks and our client applications. None of the APIs call each other directly. I think it's working out really well for the front end teams and aligning well with our engineering organization as a whole. As for who owns the GraphQL repos, everyone contributes to it but we rely on engineering leadership to keep an eye on changes as the system solidifies.
- Ed Sperling: The rule of thumb used to be that on-chip processing is always faster than off-chip processing. But the distance between two chips in a package can be shorter than routing signals from one side of an SoC to another over a skinny wire, which at advanced nodes may encounter RC delay. None of this is simple, however, and it gets worse in new areas such as 5G. In the future, advanced packaging will need to become almost ubiquitous to drive widespread applications of AI/ML/DL inference at edge nodes and in automotive and a variety of other new market segments. That requires repetition with some degree of flexibility on design—basically the equivalent of mass customization. This is the direction the packaging world ultimately will take, but it will require some hard choices about how to get there. The interconnect will remain the centerpiece of all of these decisions, but which interconnect remains to be seen.
- Where Wizards Stay Up Late: The idea on which Lick’s worldview pivoted was that technological progress would save humanity. The political process was a favorite example of his. In a McLuhanesque view of the power of electronic media, Lick saw a future in which, thanks in large part to the reach of computers, most citizens would be “informed about, and interested in, and involved in, the process of government.” He imagined what he called “home computer consoles” and television sets linked together in a massive network. “The political process,” he wrote, “would essentially be a giant teleconference, and a campaign would be a months-long series of communications among candidates, propagandists, commentators, political action groups, and voters. The key is the self-motivating exhilaration that accompanies truly effective interaction with information through a good console and a good network to a good computer.” Lick’s thoughts about the role computers could play in people’s lives hit a crescendo in 1960 with the publication of his seminal paper “Man-Computer Symbiosis.” In it he distilled many of his ideas into a central thesis: A close coupling between humans and “the electronic members of the partnership” would eventually result in cooperative decision making.
- Andrew Webster: Historically, the relationship between smaller [game] developers and publishers has been viewed as antagonistic. The developers just want to make their game, but the publishers are worried about money. This new wave of boutique publishers is changing that perspective. Instead of simply providing cash and deadlines, these labels are seen more as development partners, and it’s winning over even some of the more skeptical game developers. “We started off being a lot more afraid of them,” says Wong. “I think at the time I didn’t really know what I needed besides funding. But now I understand what a publisher should be for, which is they’re creative partners.”

Ólafur Arnalds built a robotic music system to accompany him on the piano. He calls his system of two semi generative, self playing pianos—STRATUS. You can hear his most recent song re:member. There's more explanation in The Player Pianos pt. II and a short Facebook live session. His software reacts to his playing in real-time. Then he has to react to the robots because he's not sure what they're going to do. He's improvising like a jazz band would do, but it's with robots. It's his own little orchestra. The result is beautiful. The result is also unexpected. Ólafur makes the fascinating point that usually your own improvisation is limited by your own muscle memory. But with the randomness of the robots you are forced to respond in different ways. He says you get a "pure unrestricted creativity." And it's fun he says with a big smile on his face. In What Will Programming Look Like In The Future? I said this is the best way I can think of showing what software development will look like in the future.

Want to understand how Go's interface works at the assembly level? Have fun: teh-cmc/go-internals.

The In Our Time podcast with a fascinating episode on the great engineers George and Robert Stephenson. Some software parallels struck me. Design systems: George was the first to design the cars and the rails together to make a better overall system. Complex Systems Start from Simpler Working Systems: railroads existed before steam engines, cars were pulled by horses, so the idea was not new. Success makes standards: a new track gauge was established by building a successful new line; External Events Drive Innovation: horses were expensive because of the Napoleonic Wars, this drove development of mechanical power. Bloodlines matter: do kids learn from their parents or is it nature? The son became an excellent engineer; Enterprises drive business: coal owners needed trains to transport coal so they made it happen. Unexpected uses: though trains were meant for commerce trains unleashed the desire to travel. Trains expanded to meet the demand. Diets changed. People commuted to work. Old doesn't like the new: if you invested a lot of money in canals these new trains threatened your investment. Networks matter more than lines: strategically building a network of railway lines was more important than building any one line; Engineering common sense creates systems that work; There's always a bubble: investment in new train lines reached bubbelic proportions. Classes first, then the masses: train travel was very expensive at first, eventually it became cheap enough for most people to use. Big projects require coalitions: money, government, talent, and management must work together for large projects to succeed. Good teams win: train lines are hard to build. They hired good teams, dog fooded their product by traveling the lines, were pragmatic, kept a close watch on the details, and worked tirelessly to overcome insane obstacles. Specialization reduces power: engineers at one time had a lot of clout and respect. When they specialized into smaller more insular groups, engineers became labor to buy and the financiers and lawyers rose in status.

Who knew there were 2,000 year old datacenters? 10 Things I Learned Shipping an Ancient Data Center to AWS.

True or false: the internet was created to survive a nuclear war? From my reading of Where Wizards Stay Up Late: The Origins Of The Internet the answer is mostly false, but a little true. It seems in the mid 1960s three different people—Paul Baran, Donald Davies, Larry Roberts/Wesley Clark—independently invent the idea of the packet switched network which eventually became the internet. Paul Baron while working at RAND came up with the idea first. Baron's motivation clearly was surviving the a nuclear war. The cold war was hot and he proved AT&T's network would fail under a nuclear attack. Baran, inspired by neural architecture, came up with he idea of the distributed network. Routing around problems and finding new pathways was the right way to deal with network damage. Baran also invented the idea of fracturing messages into parts called "message blocks", that could follow different paths over the network to their destination. Baran's message blocks would route around the network by unmanned switches using a “hot potato routing” algorithm. In contrast to a store-and-forward "hot potato" routing sent packets out a self-learned route as quickly. Nuke a few nodes and as long as there was 3-4 different routes between each node the packet would eventually find it's destination. In 1965, about the time Baran stopped his work, Donald Davies, a physicist at the British National Physical Laboratory (NPL), independently invented a network similar to what Paul Baran invented. In fact, he came up with the word "packet" that we still use today. Davies did not know about Baran's work and his motivations were completely different. Davies simply wanted a better network, one that would better support the requirements of interacting computers, whose bursty characteristics of computer-generated data traffic did not fit well with the uniform channel capacity of the telephone system. Unlike Baran, Davies design was well received in Britain. Larry Roberts worked for an agency of the United States Department of Defense called ARPA (Advanced Research Projects Agency). Larry was tasked with building a network to connect very different and very expensive computers together so they could work together and share data. This would become the ARPANET. Wes Clark came up with the idea of a subnetwork with small, identical nodes, all interconnected. These nodes would connect directly to the hosts so the hosts didn't have to do any of the networking work. The hosts were time-shared computers that were already busy with all the work they could handle. By using a separate network of nodes to connect hosts computer owners couldn't complain about how their computers were being over-burdened with network functions. These network nodes were called IMPs (interface message processors) and they performed all the functions of connecting the network. IMPs handled sending and receiving data, error checking, data retransmitting, message verification, routing, and interfacing with host computers. This work was all done independent of Baran and Davies. Eventually they all learned of each others work and ideas from Baran's work and Davies' work informed ARPANET's design. After TCP/IP was invented ARPANET would transform into the internet. ARPANET started out as a way to connect computers together so they could share resources. The internet evolved into a global tool for communication. Would it survive a nuclear war? Stupid question. Nothing will survive a nuclear war.

Pretty slick. Building Serverless React GraphQL Applications with AWS AppSync: AWS AppSync is extremely powerful, and in this tutorial we’ve just scratched the surface. In addition to DynamoDB, AppSync also supports ElasticSearch & Lambda functions out of the box.

Fragmented packets are a problem for DNS. The internet needs redistintermediation. Just say no to middle boxes, but since that won't happen there's ATR. The Problem: One of the major issues is the ossification of the network due to the constraining actions of various forms of active middleware. The original idea was that the Internet was built upon an end-to-end transport protocol (TCP and UDP) layered above a simple datagram Internet Protocol. The network’s active switching elements would only look at the information contained in the IP packet header, and the contents if the “inner” transport packet hear was purely a matter for the two communicating end points. But that was then, and today is different. Middle boxes that peer inside each and every packet are pervasively deployed. This has reached the point where it's now necessary to think of TCP and UDP as a network protocols rather than host-to-host protocols. One of the more pressing and persistent problems today is the treatment of fragmented packets. We are seeing a very large number of end-to-end paths that no longer support the transmission of fragmented IP datagrams. This count of damaged paths appears to be getting larger not smaller...Fragmented packet drop is also depressingly common. Earlier work in September 2017 showed a failure rate of 38% when attempting to deliver fragmented IPv6 UDP packets the DNS recursive resolvers...An approach to address this challenge is that of “Additional Truncated Response” (documented as an Internet Draft: draft-song-atr-large-resp-00, September 2017, by Linjian (Davey) Song of the Beijing Internet Institute). The approach described in this draft is simple: If a DNS server provides a response that entails sending fragmented UDP packets, then the server should wait for a 10ms period and also back the original query as a truncated response...
The case for ATR certainly looks attractive if the objective is to improve the speed of DNS resolution when passing large DNS responses.

Billions - Axe hides his secret wall street trading profits in crypto. I so wanted to know which cryptocurrency! Also, Why Bitcoin is bullshit, explained by an expert.

Storage is cheap, abuse it. What we learned doing serverless — the Smart Parking story: What is data mitosis? It's replication of data into multiple tables that are optimized for specific queries. What? I'm replicating data just to overcome indexing limits? This is madness. NO! THIS. IS. SERVERLESS! While it might sound insane, storage is cheap. In fact, storage is so cheap, we'd be naive to not abuse it. This means that we shouldn't be afraid to store our data as many times as we want to simply improve overall access. Bigtable works efficiently with billions of rows. So go ahead and have billions of rows. Don't worry about capacity or maintaining a monstrous data cluster, Google does that for you. This is the power of serverless. I can do things that weren't possible before. I can take a single record and store it ten (or even a hundred) times just to make data sets optimized for specific usages (i.e., for specific queries)...By using BigQuery, we can scale our searches across massive data sets and get results in seconds. Seriously. All we need to do is make our data accessible...In our architecture, this is almost too easy. We simply add a Cloud Function that listens to all our events and streams them into BigQuery. Just subscribe to the Pub/Sub topics and push...Make sure you understand Cloud Functions fully. See them as tiny connectors between a given input and target output (preferably only one). Use this to make boilerplate code. Each Cloud Function should contain a configuration and only the lines of code that make it unique. It may seem like a lot of work, but making a generic methodology for handling Cloud Functions will liberate you and your code.

You're unlikely to find a better explanation. Hash-based Signatures: An illustrated Primer: the imminent arrival of quantum computers is going to have a huge impact on the security of nearly all of our practical signature schemes, ranging from RSA to ECDSA and so on. This is due to the fact that Shor’s algorithm (and its many variants) provides us with a polynomial-time algorithm for solving the discrete logarithm and factoring problems, which is likely to render most of these schemes insecure. Most implementations of hash-based signatures are not vulnerable to Shor’s algorithm. That doesn’t mean they’re completely immune to quantum computers, of course. The best general quantum attacks on hash functions are based on a search technique called Grover’s algorithm, which reduces the effective security of a hash function.

Run Sausage Run! has had 21 Million Downloads in 2 Months. How do you make a game successful even if it isn’t viral or wildly addictive, it’s full of ads, and there isn’t a lot of original content or unique game-play? Adopt a cute narrative: a sausage runs around the kitchen, keeping away from knives — and that’s the whole story. Create an icon which connects to the narrative and engages potential users to look at it and ask — “Why the hell is the sausage running, anyway?” Ask for a rating on a successful user outcome, but you can even ask after a failure. Create familiar characters like ninja, cactus, hair dude, robot, cowboy, that people always like. Silly, rhythmic music was very carefully done, and plays an important role in the game. Even more brilliant use of music was made in video ads for the game.

Evolutionary Serverless Architecture: JeffConfg Hamburg 2018 talk. Start as simple as possible: automate deployment, use small managed building blocks, add monitoring/tracking everywhere, don't optimize. Evolve architecture over time: Analyze monitoring/tracking data, optimize where necessary, decouple using step functions. Other learnings: step functions are awesome timers, user interaction with step functions is ugly, idempotent state transitions, remove calls need retry logic.

Kayenta runs approximately 30% of our production canary judgments, which amounts to an average of 200 judgments per day. Automated Canary Analysis at Netflix with Kayenta: At Netflix, we augment the canary release process and use three clusters, all serving the same traffic with different amounts: The production cluster. This cluster is unchanged and is the version of software that is currently running. This cluster may run any number of instances; The baseline cluster. This cluster runs the same version of code and configuration as the production cluster; Typically, 3 instances are created; The canary cluster. This cluster runs the proposed changes of code or configuration. As in the baseline cluster, 3 instances are typical.

What We Learned Deploying Deep Learning at Scale for Radiology Images: The AI community including Facebook, Microsoft and Amazon came together to release Open Neural Network Exchange (ONNX) making it easier to switch between tools as per need...But for our present needs, deploying models in Pytorch has sufficed...Docker: For operating system level virtualization...Anaconda: For creating python3 virtual environments and supervising package installations...Django: For building and serving RESTful APIs...Pytorch: As deep learning framework...Nginx: As webserver and load balancer...uWSGI: For serving multiple requests at a time...Celery: As distributed task queue...We use Amazon EC2 P2 instances as our cloud GPU servers primarily due to our team’s familiarity with AWS...Initially, we started with buying new P2 instances. Optimizing their usage and making sure that few instances are not bogged down by the incoming load while other instances remain comparatively free became a challenge. It became clear that we needed auto-scaling for our containers...We decided to go ahead with Kubernetes...Since many hospitals and radiology centers prefer on-premise deployment, Kubernetes is clearly more suited for such needs...Another thought was to keep the models loaded in memory and process images through them as the requests arrive. This is a good solution where you need to run your models every second or even millisecond (think of AI models running on millions of images being uploaded to Facebook or Google Photos)

Can humans and machine agents just get along? Perhaps if they keep in mind these Ten challenges for making automation a "team player" in joint human-agent activity: To be a team player, an intelligent agent must fulfill the requirements of a Basic Compact to engage in common-grounding activities; To be an effective team player, intelligent agents must be able to adequately model the other participants' intentions and actions vis-à-vis the joint activity's state and evolution; Human-agent team members must be mutually predictable; Agents must be directable; Agents must be able to make pertinent aspects of their status and intentions obvious to their teammates; Agents must be able to observe and interpret pertinent signals of status and intentions; Agents must be able to engage in goal negotiation; Support technologies for planning and autonomy must enable a collaborative approach; Agents must be able to participate in managing attention; All team members must help control the costs of coordinated activity.

If you use Hadoop and standarsized on HDFS for your file system and have run into performance and scaling problems then there's a lot of good technical advice in Scaling Uber’s Hadoop Distributed File System for Growth.

Index Structures, Access Methods, whatever: b-tree - provides better read efficiency at the cost of write efficiency. The worst case for write efficiency is writing back a page for every modified row; LSM - provides better write efficiency at the cost of read efficiency. Leveled compaction provides amazing space efficiency. Tiered compaction gets better write efficiency at the cost of space efficiency; index+log - provides better write efficiency. Depending on the choice of index structure this doesn't sacrifice read efficiency like an LSM. But the entire index must remain in RAM (just like a non-clustered b-tree) or GC will fall behind and/or do too many storage reads.

It's easy to make fun of the lack of security for systems created in a more innocent time, but IoT today isn't generally much better, is it? Flaw exposes cities' emergency alert sirens to hackers: These emergency systems are found across the US, primarily used to warn against natural disasters and terrorist attacks, but also inbound threats from hostile nation states. The systems are far from perfect. Almost exactly a year ago, an unknown hacker replayed a radio signal used during regular scheduled tests of the system to maliciously trigger Dallas' emergency alert system in the middle of the night.

How do you provide near real-time analytics to our customers on billions of search queries per day? Algolia shows how they did it in Building Real Time Analytics APIs at Scale. RedShift, BigQuery and ClickHouse weren't real-time enough. And for a public API you don't want pricing driven by usage. So they turned to Citus Data and their Citus extension for PostgreSQL, that makes it seamless to scale Postgres by distributing tables and queries across multiple nodes. The Postgres COPY command is used to insert batch events into Citus. A single customer’s data lives on the same shard so it can take advantage of collocation. Metrics are not served from raw events. As a rule of thumb, you can expect to aggregate 1M rows per second per core with PostgreSQL. Roll-up tables are used. Returning tops and distinct counts are made easy thanks to the TOPN and HLL extensions. For their analytics solution they have several levels of rollups. They aggregate events every 5 minutes, and further aggregate them by day. Rolled up data is deleted so terabyes of storage aren't required. There's a compression ratio ranging from 50,000 to 150 on average. Since the metrics are pre-computed per day results can be returned in milliseconds across virtually any time range. Their pipeline was built using Go using a microservices approach.

Good comparison. Comparing Kubernetes to Pivotal Cloud Foundry — A Developer’s Perspective: Cloud Foundry is a cloud-agnostic platform-as-a-service solution...Kubernetes is an open source cloud platform that originated from Google’s Project Borg...Both use the idea of containers to isolate your application from the rest of the system...both are designed to let you run either on public cloud infrastructure (AWS, Azure, GCP etc.), or on-prem...Both offer the ability to run in hybrid/multi-cloud environments...both support Kubernetes as a generic container runtime...First and foremost, Cloud Foundry is a PaaS. I don’t feel Kubernetes fits this description...As a developer, the biggest differentiator for me is how Cloud Foundry takes a very Spring-like, opinionated approach to development, deployments and management...Kubernetes takes a different approach. It is inherently a generic container runtime that knows very little about the inner-workings of your application.

A fun debugging story caused by a problem with...wait for it...garbage collection and thread madness. Shocked! Shocked I say. Debugging a long-running Apache Spark application: A War Story: The default garbage-collection interval was simply too long for our use case: 30 minutes. Running every 30mins was way too little since we generate hundreds of thousands of classes during peak load. This means that the keepCleaning thread is doing nothing for the first 30 minutes, and then is suddenly swamped with way too many tasks to keep up with. This problem then keeps getting worse and worse since the cleanup tasks are generated faster than they can be processed, which in turn leads to bigger and bigger GC heaps, a truly overwhelming task for the GC, until it can’t keep up any more and then the whole cluster dies in a slow and agonizing death.

aws/chalice: is a python serverless microframework for AWS. It allows you to quickly create and deploy applications that use Amazon API Gateway and AWS Lambda. It provides: A command line tool for creating, deploying, and managing your app; A familiar and easy to use API for declaring views in python code; Automatic IAM policy generation

OpenHFT/Java-Thread-Affinity: Lets you bind a thread to a given core, this can improve performance (this library works best on linux).

tower-rs (article): a library for writing robust network services with Rust. It is being built in service of the Conduit proxy, which is using the Tokio ecosystem to build the world’s smallest, fastest, most secure network proxy.

spinnaker/kayenta (article): a platform [from Netflix] for Automated Canary Analysis (ACA). It is used by Spinnaker to enable automated canary deployments. A canary release is a technique to reduce the risk from deploying a new version of software into production. A new version of software, referred to as the canary, is deployed to a small subset of users alongside the stable running version.

Softmotions/iowow: The C11 persistent key/value database engine based on skiplist. adamansky: Yes, b+trees are more performant for reading compared to skip-list because of better data locality on memory/disk. But SL better at insertion and sequential reads. 255 GB is a simple trade-off between disk space required per record and the ability to manage large data-sets since SL is a not very space friendly data structure) Although I think 255 GB is good enough in many use cases.

Microsoft/Picnic: The Picnic signature scheme is a family of digital signature schemes secure against attacks by quantum computers. This is a reference implementation of these schemes.

Simple Encrypted Arithmetic Library (SEAL): an easy-to-use homomorphic encryption library, developed by researchers in the Cryptography Research Group at Microsoft Research. SEAL is written in C++, and contains .NET wrappers for the public API. It has no external dependencies, so it is easy to compile in many different environments.

Opportunities and obstacles for deep learning in biology and medicine: Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation.

Security of homomorphic encryption: This document is an attempt to capture the collective knowledge at the workshop regarding the currently known state of security of these schemes, to specify the schemes, and to recommend a wide selection of parameters to be used for homomorphic encryption at various security levels. We describe known attacks and their estimated running times in order to make these parameter recommendations. We also describe additional features of these encryption schemes which make them useful in different applications and scenarios. Many sections of this document are intended for direct use as a first draft of parts of the standard to be prepared by the Working Group formed at this workshop.

Stuff The Internet Says On Scalability For April 13th, 2018

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale