Stuff The Internet Says On Scalability For August 31st, 2018

Hey, it's HighScalability time:

This mind blowing creation is from John Williamson. It's the first million integers, represented as binary vectors indicating their prime factors, laid out with UMAP. No, I really have no idea what that means either, but it did make me consider that our universe could be created by an algorithm. What are the wiggly cycles on the periphery? Groups of numbers that share a minimum amount of prime factors, further out groups are numbers that have increasing amounts of shared prime factors. So the primes are at the core, ungrouped as they have no prime factors to use to join groups. Primorials should be furthest out.

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

  • 800 million: internet users in China; 19,000: hours of audio from historic Apollo 11 mission; 2: mph Frodo walking pace to Mordor; 67%: daily trade volume faked; 60 TFLOPS: Data's theoretical peak, which is 8 NVIDIA Voltas, which you can rent on ec2 for $25 an hour; 7nm: an expense too far for GlobalFoundries; 98%: atoms replaced every year in human body; 150 billion: hours of Fitbit heart data; 4: times per second human brain oscillates in and out of focus; 27.9%: DNS requests over UDP from China to Google Public DNS are intercepted; 

  • Quotable Quotes:
    • Chris Hadfield: the best antidote to fear is competence.
    • @kellabyte: I’m making fun of GraphQL but I like it so far. I’m just having fun with it because it’s literally re-learning a new way to do the same thing a group of other people solved some of the same problems almost 20 years ago. But like I said, not a hater. I like GraphQL so far.
    • Corey Quinn: Kubernetes is Named After the Greek God of Spending Money on Cloud Services
    • @QuinnyPig: #SREcon @aaronblohowiak "Once upon a time when the stars were young we had an ELB outage" ...because ELBs are incredibly durable right up until "Oh shit! Load!" at which point TCP terminates on the floor.
    • @GossiTheDog: This is wild - the White House are rolling back cybersecurity baseline standards for government (saying they will issue new ones in the future), and ditching deployment of DNSSEC and IPv6 (as mandated under Obama admin) saying they are already in place (they aren’t).
    • @nicolefv: Outsourcing is a bad idea. (shocker) Low performers are almost 4x as likely to use functional outsourcing. Read about it and some of the consequences (search for “Misguided performers”) in the 2018 Accelerate State of DevOps Report 
    • Bernard Golden: I see Snowball Edge with EC2 as an AWS initiative designed to reduce the friction of migrating data into AWS with the ultimate goal of enabling applications to migrate to AWS, not a way to run AWS applications on-prem. In other words, this new Snowball variant isn’t about processing data on-prem, it’s about making Snowball a better on-ramp to AWS.
    • @allspaw: In "managing workload there are only four coping strategies: (1) shed load, (2) do all components but do each less thoroughly, thereby, consuming fewer resources, (3) shift work in time to lower workload periods, (4) recruit more resources." (Woods and Hollnagel, 2006)
    • Stacy Horn: In those days journalists wrote that I started Echo to provide a safe place for women on the Net. Bite me. I wanted to get more women on Echo to make it better.
    • @michael_beh: there are so many price comparisons between #serverless and non-serverless deployments missing costs for multi-AZ, multi-region, operational OS mgmt, load balancer, pre-prod envs, etc. All of them are part of GBsec price in serverless. It's too often an apples/oranges comparison.
    • Andy Hertzfeld: Steve got mad. He goes, “The Apple II was going to be dead. Your OS is going to be obsolete before it’s finished!” And then he said something like, “The Macintosh is the future of Apple and you are going to start now.” I just wanted one day, really. It was Thursday afternoon. I said “Monday” because Monday seems to be a good time to make a new start. But he goes, “No! You’re going to start on it now!” And he went and he pulled the plug on my Apple II, which had my code not saved. I was right in the middle of working on it. He just yanks the plug out, and then without pausing he picks up the Apple II and starts walking away.
    • @cloud_opinion: In the short term, Serverless is a hybrid scenario - you will have some/many components of your app as serverless. Would be rare to see a 100% serverless app in the wild. For high volume apps, use FaaS to burst, but not for baseline load. Agree/Disagree?
    • Sir_Cmpwn: I used to work at SpaceX on the team that did a piece of software called "WarpDrive". It was a massive monolithic ASP.NET application, with large swaths done with ASP.NET WebForms and a slow frontier of ASP.NET MVC gradually growing when I was working there. This application was responsible for practically everything that ran the factory: inventory, supply chain management, cost analysis, etc. Elon is a big Windows fan and pushed hard to run the whole shop on Microsoft tech. Thankfully the rockets fly with a heavily customized Linux install.
    • @jfagone: I want to tell you a story. For 3 years I’ve been researching Elizebeth Smith Friedman, a puzzle-solving heroine of the world wars. A codebreaker solves secret messages without knowing the key. Starting from scratch, at age 23, Elizebeth became one of the best ever. “THE SPY STUFF.” Elizebeth caught Nazi spies in WWII. She hunted Nazis. 
    • @JeffHandley: In summary, OData was a way to serialize a SQL statement into a URL. When first applied, it gave too much power to the client, allowing it to use joins, where clauses, and sorting. A lot of that power can be dialed back now, but I still found it challenging to limit the exposure.
    • @QuinnyPig: #SREcon @aaronblohowiak "We have global demand that varies by hour. We have to serve the peak of that demand, so that's where we scale things. Once we go multi-region... do we scale to 100% of peak in every region?"
    • Vadim Tkachenko: If you are looking to improve throughput in IO-bound workloads, either increasing GP2 volumes size or increasing IOPS for IO1 volumes is a valid method, especially for the MyRocks engine.
    • Paul Buchheit: [re: Gmail] None of the other web mail providers had autocomplete. Now you don’t really even think about it, but it makes a big difference. You can send email so fast and you don’t have to remember the addresses. To my knowledge, we were the first web mail provider to do it. Desktop products would have things like that sometimes, but no web mail was doing that at the time.
    • DSHR: FileCoin won't be able, as S3 does, to claim 11 nines of durability and triple redundancy across data centers. So the real competition is S3's Reduced Redundancy Storage, which currently costs $23K/PB/month. Assuming that Amazon continues its historic 15%/year Kryder rate, storing a Petabyte in RRS for a decade is $1.48M. So, if you believe cryptocurrency "prices", FileCoin's "investors" pre-paid $257M for data storage at some undefined time in the future. They could instead have, starting now, stored 174PB in S3's RRS for 10 years. So FileCoin needs to store at least 174PB for 10 years before breaking even.
    • Yodiddlyyo: If you're using cpanel now, I would suggest not going straight to AWS. Unless you really want to and have time to teach yourself stuff. Go to digital ocean first. It's the best middle ground. Its cheaper, for example I'm hosting 2 react apps and 4 wordpress sites on a single $5 server, and they have a shit ton of guides. If you're using a VPS. Theres no need to use cpanel at all.
    • simonjgreen: I may be feeling a touch pedantic, perhaps I just live in a different world being on the ISP side, but the Internet is not centralised at all. Large quantities of traffic ends up going to certain central locations, but the network itself is decidedly decentralised. Just look at LINX for example and the other major IXs, nobody there is telling everyone else where to send their packets, they just go where needed. In fact LINX as an organisation is a members organisation run by its members with each one, big or small, having one vote. IXs, BGP, RIRs like RIPE, and the proliferation around the globe of access ISPs, all prove how decentralised the Internet is.
    • Susan Rambo: An IP design house has developed a scalable DRAM replacement using carbon nanotubes (CNTs) that abolishes the DRAM refresh rate, stores the content permanently, has better timing than DRAM and is scalable. And it lasts for somewhere between 300 and 12,000 years.
    • DSHR: Video games are an important cultural artifact. Unlike books, movies, and even music, national libraries and other archives typically don't have organized programs to collect and preserve them, much less make them available to scholars. AFAIK the Internet Archive's accessible collections of console and arcade games are unique among established archives, but they lack Nintendo's catalog. Figuring out a way for institutions to preserve this history without undue legal risk is important.
    • NuclearArmament: This is stupid, stop trying to milk silicon beyond 11 nm and start using gallium arsenide or indium gallium arsenide. The Fujitsu AP2000 used BiCMOS and GaAs fabrication back in the early 1990s, for crying out loud. Get rid of silicon, it's outdated and needs to be replaced! You have BILLIONS of dollars and get government subsidies, why not take a risk?!
    • @cloud_opinion: We need to document Serverless spend best patterns: 1. Using cloud functions for infrequent invocations will save money 2. Using Appengine flex for constant invocations will save money 3. Don't use datastore as a replacement for app state, instead use cloud storage
    • @QuinnyPig: "Teams that adopt essential cloud characteristics are 23 times more likely to be elite performers." You can't slap a sign on your datacenter and call it Cloud.
    • Martin Thompson: We have a culture of considering the quality of a system too late, or not in the context of business requirements. This can manifest in the poorly named “non-functional requirements” which are better termed as quality attributes. In other engineering disciplines, the quality attributes are an integral part of the design process, software is only beginning to evolve into an engineering discipline. The quality of a software system to deliver appropriate resilience, responsiveness, elasticity, usability, security, or other such requirements will distinguish software as an ad-hoc discipline from engineering.
    • Alexandra Robbins: Yet many perceived popular students demand that group members stick to the same bland fare. In the school setting, the higher a group’s status, the more likely it is to require unanimity. “The more influence a group’s members exert on each other . . . the less likely it is that the group’s decisions will be wise ones,” journalist James Surowiecki wrote in The Wisdom of Crowds. “The more influence we exert on each other, the more likely it is that we will believe the same things and make the same mistakes.”
    • Adam Tornhill: Revisiting these memories of failures past has been painful. Each one of them cost me lost sleep and, consequently, a caffeine induced headache. My idea was to recall the stories as I remember them without sparing myself (yes, I'm still embarrassed over the design that lead to the year 2k bug). The reason for this transparency is because I think there's a common theme that transcends any personal lessons I might have learned from repeated failures: the worst errors occurred through an interaction between the code and a surrounding system. Sometimes that surrounding system is an API, at other times it might be the operating system or hardware, and sometimes the external system is other people who we communicate with to capture the requirements. That is, the worst system failures occurred through interactions with something we don't necessarily control.
    • phakding: You would be surprised to know majority of trading companies/exchanges use Java for trading platform. So does majority of companies in finance sector. I have been working in this secor for over 10 years and have interviewed for many of them.
    • afandian: I'm currently using Hetzner Cloud in production. Not for user-facing services, but background data churning. Very happy with the service so far, and it has a Terraform plugin, which makes it convenient to use. I think the feature-set and cost are in proportion, if you see what I mean. Compared to AWS EC2 its a great value proposition. Interestingly, they recently announced a 'dedicated vCPU' option for about 10x the price. Top option is €269 vs €29, 32 GB vs 128 GB RAM. But I'v found the performance of the default product to be perfectly good for my needs. The major shortfall with Hetnzer I have found is that the storage isn't scalable. You get what you get. You can choose SSD RAID or Ceph, but you still get the same allowance.
    • Jonathan Corbet: Multiple notifications can be consumed without the need to enter the kernel at all, and polling for multiple file descriptors can be re-established with a single io_submit() call. The result, Hellwig said in the patch posting, is an up-to-10% improvement in the performance of the Seastar I/O framework. More recently, he noted that the improvement grows to 16% on kernels with page-table isolation turned on.
    • Sally Davies: merely running can push the Achilles’ tendon to over 75 per cent of its ultimate tensile strength, whereas weightlifters can experience stresses of over 90 per cent of the strength of their lumbar spines, when they are hefting hundreds of kilogrammes. How does biology handle these loads? The answer is that our bodies constantly repair and recycle their materials. In tendons, collagen fibres are replaced in such a way that, while some are damaged, the overall tendon is safe. This constant self-repair is efficient and inexpensive, and can change based on the load. 
    • sliken: Each generation of fab is significantly more expensive than the last. Pushing the edge of physics is expensive. Transistors are so small these days they can't use the normal light frequencies any more. Things like precisely focusing light gets trickier when extreme frequencies are used. The result is you have to almost double your volume with each generation, as a result there are less and less fabs running the current process. Makes AMD decision to split off Global Foundries look pretty good in hindsight. Even those companies with the leading process make a substantial number of chips on older process. So the bleeding edge CPU gets the latest greatest, but the chipset, flash chips, and memory chips are often a generation or more behind. Seems realistic that if you are behind and don't have a huge customer (like apple or nvidia) lined up that you just save a few $billion and let TMSC have it. TMSC will of course charge more without competition, and make chips using TMSC less competitive. If Samsung can't compete with TMSC (which remains to be seen) TMSC might well delay future shrinks. The market loves Moore's law, but the stress is really starting to show. Physics is starting to interfere with what the market wants. Things like CPU clock speeds stagnating, power per chip doubling for the first time in the newest generation, and of course the ever lengthening product cycles. It does make you wonder when AMD and Intel double the normal CPU socket from 95 watts to 180 watts or so. What are they going to do for the next generation?
    • Jennifer Ouellette: Fire ants (and ants in general) provide another textbook example of collective behavior. A few ants spaced well apart behave like individual ants. But pack enough of them closely together, and they behave more like a single unit, exhibiting both solid and liquid properties. You can pour them from a teapot like ants, as Goldman's lab demonstrated several years ago, or they can link together to build towers or floating rafts--a handy survival skill when, say, a hurricane floods Houston. So it's not surprising that they also excel at regulating their own traffic flow. You almost never see an ant traffic jam. When an ant encounters a tunnel in which other ants are already working, it retreats to find another tunnel. It also helps that only a small fraction of the colony is digging at any given time: 30 percent of them do 70 percent of the work.

  • Wow, I thought homomorphic encryption was for the far future. Sometimes the future sneaks up on you. Homomorphic encryption has been around 30 or so years. HE allows you to perform operations on encrypted data as if it were in plain-text. HE has been thought impractical because it is computationally expensive. In Protecting Your Data In Use At Enveil, founder Ellison Anne Williams describes how her company uses homomorphic encryption to ensure analytical queries can be executed without ever decrypting data. The assumption is an attacker can see the data on disk and in-memory. Nothing is every decrypted, so it doesn't matter. This technology was developed while working for US intelligence agencies. Their breakthrough was to make the technology scalable vertically and horizontally. API based to be plug and play into existing systems. A key use case it to perform encrypted computation over unencrypted data, supporting a zero-trust interface between the data supplier and the computation. Unlocks data brokerage opportunities because you can have secure and private multi-tenant access to a shared data lake. No risk because each party is protected by encryption, you can't tell what they are doing. Allows processing most sensitive data in the public cloud. Encrypting at rest isn't enough because the data and the query is not encrypted during use. The primary attack vector is not to attack data at rest or in-transit, attacks are easiest when the data has already been decrypted, which is in-memory and in-processing. Even the search itself can contain a lot of sensitive information. The public cloud can now be used for sensitive operations. Unfortunately, they didn't talk performance or how the technology actually works at all.

  • Start a service. Open it up with an API. Lure children to your cabin in the woods with candy. Fatten them up with treats. Then eat them. 23andMe is shuttering their API. Don't go to the cabin in the woods.

  • Unlike a purchase a Costco, once you've bought a microservices architecture it's hard to take it back. Segment told their story of transitioning back to a monolith from microservices on this Changelog podcast. The major issue is they had thousands of microservices and keeping updates synced across all the services became impossible. The mistake was not realizing microservices are as much about team organization as software architecture. A service is owned by a team so dependencies don't matter. When a small team is in charge of making updates on all those separate code bases it just can't work. A team per service. If you just have one team then you have one service which is effectively a monolith. There's also a good conversation on relearning lessons about how to deal with queues. They ended up with a queue per source-destination pair as the best way to minimize head-of-line blocking, spinning up load handling VMs, condition QoS, etc. Oh, and advice to programmers: read the comments on Hacker News on your articles. Yes, you'll get told you're an idiot, but there are also a lot of smart HN posters, you'll learn a lot of things that you need to learn. Skip suffering through ontogeny recapitulating phylogeny. Also, Goodbye Microservices: From 100s of problem children to 1 superstarCentrifuge: a reliable system for delivering billions of events per day, on Hacker News

  • Good set of Notes from JupyterCon — How Project Jupyter is Enabling Large Scale DataScience: Netflix is betting really big on notebooks are migrating over 10k workflows to notebooks.

  • Are extra database accesses because of GDPR costing you money? Reduce accesses using a Bloom filter. Saving Money and Protecting Privacy With Bloom Filters: It is impossible to guarantee that data from opted-out users won’t end up in our ingestion pipeline due to the inherently unpredictable nature of distributed systems, so we have to do a GDPR status check against every data-point...At current prices (2018) the required additional [DynamoDB] read capacity would cost an extra $4500 per month at our present volume...Our implementation of the GDPR data-point filter is as follows: We fill a bloom filter with opted-out users, serialize it, and upload it to S3. The bloom filters need to be recreated each time because you can only add, not remove keys...When we verify that the data is from an opted-out user, we simply drop the data-point...in the end we actually only need to make about 25 dpps dynamo calls. This is only about 0.05% of all the data-points that we actually need to hit the database for and only amounts to about $2 of extra dynamo charges per month. The size of the serialized bloom filter files amount to about 750 KB each, so there is room to decrease the false positive rate even further as long as available memory is not an issue. Also, take a look at Cuckoo Filter: Practically Better Than Bloom, they allow deletion.

  • Terabytes Of RAM are still the future. Is your language ready? An average pause time of 1ms and a max of 4ms. Java's new Z Garbage Collector (ZGC) is very exciting: So why the need for a new GC? After all Java 10 already ships with four that have been battle-tested for years and are almost endlessly tunable. To put this in perspective, G1 the most recent GC in Hotspot was introduced in 2006. At that time the biggest AWS instance available was the original m1.small packing 1 vCPU and 1.7GB of ram, today AWS will happily rent you an x1e.32xlarge with 128 vCPUs and an incredible 3,904GB of ram. ZGC’s design targets a future where these kinds of capacities are common: multi-terabyte heaps with low (<10ms) pause times and impact on overall application performance (<15% on throughput)...To achieve its goals ZGC uses two techniques new to Hotspot Garbage Collectors: coloured pointers and load barriers...Pointer colouring is a technique that stores information in pointers (or in Java parlance, references) themselves...Load barriers are pieces of code that run whenever an application thread loads a reference from the heap. The load barrier’s job is to examine the reference’s state and potentially carry out some work before returning the reference.

  • Assuming it's true, is this really a horror? It works. This is how stuff gets made. Do you really want them to stop everything to do a 6 month rewrite that probably wouldn't work anyway? @atomicthumbs: "A former Tesla employee, who worked on their IT infrastructure, is posting in a subforum of a subforum, a little-known place for funy computer forgotten by time. His NDA has expired. He has such sights to show us. Join me and I will be your silent guide into a world of horror."  Good discussion on reddit and on Hacker News.

  • 19 Serverless Microservice Patterns for AWS: The Simple Web Service; The Scalable Webhook; The Gatekeeper; The Internal API; The Internal Handoff; The Aggregator; The Notifier; The FIFOer; The “They Say I’m A Streamer”; The Strangler; The State Machine; The Router; The Robust API; The Frugal Consumer; The Read Heavy Reporting Engine; The Fan-Out/Fan-In; The Eventually Consistent; The Distributed Trigger; The Circuit Breaker. Serverless microservices should at least adhere to the following standards: have there own private data; be independently deployable; Utilize eventual consistency; Use asynchronous workloads whenever possible; Keep services small, but valuable. 

  • Isn't most computer system sickness diagnosed based on observational studies? 5 Questions: John Ioannidis, a celebrated medical researcher, calls for us to stop using observational studies: Simply by observing what people eat and trying to link this to disease outcomes is moreover a waste of effort. These studies need to be largely abandoned. We’ve wasted enough resources and caused enough confusion, and now we need to refocus. 

  • AWS has a nice series of short and to the point videos on This Is My Architecture

  • Mostly about the game play, but there are a few tech details hidden inside. How World of Warcraft Was Made: The way we would run our realms back then was everything that happened to a player on a realm was done on the hardware for that realm. So if you had a realm that was very highly populated, you had a certain number of [server blades] that were allocated to that and instances were run on those blades. You couldn't run it with your friend on the other server. What would happen is if we ran out of resources, you would get 'Transfer aborted. Instance not found' and that would happen on higher population realms pretty frequently," says technical director Patrick Dawson. "How do we add hardware, where do we add hardware, to what servers, and how does that work? Now that we've moved to more of a cloud-based mentality, that doesn't really happen anymore, but that was a challenge..."That was the first time we've had to put a player in their own instance and did this across the population of World of Warcraft," says Patrick Dawson. "So Garrisons actually exists as an instance, much like Deadmines or Molten Core would have. Except instead of having 30 or 40 people in it, you would have one. An instance is expensive; it's not cheap, it's very costly to spin up an instance. So when you spin one up for each player, World of Warcraft can get a little expensive. So we had to do really clever things with memory sharing. So what we do actually a start of a process and then have a thread for each garrison for it." 

  • Amplification is a problem for SSDs too. Real world SSD wearout: Redis+RDB generates a ton of disk writes and it depends not on the amount of changes in Redis db, but on DB size and dump frequency...Actively used SWAP on SSD is probably a bad idea...Bad database design or access patterns might produce a lot of temp files writes.

  • Lessons Learned: Switching from CircleCI to Google Cloud Build: We recently migrated Focuster’s infrastructure over to Google’s Kubernetes service (GKE) recently and I noticed they had their own build service, Cloud Build (GCB)...Google Cloud Build on the other hand provides a free tier of 120 build minutes per day on their standard size container which is an n1-standard-1 machine type (1 core, 3.75gb RAM)...Above the free tier the pricing is only $0.003 per build/minute. My builds are averaging about 10 minutes so I’d have to do over 1667 builds a month, or 50 builds a day to hit the $50 that CircleCI charges...CircleCI basically charges you based on your concurrency level and gives you unlimited minutes. GCB has a practical limit of 10 concurrent builds but charges you per build/minute. 

  • The distributed block hash join algorithm makes join operations up to 23 thousand times faster than they are with the nested loop algorithm. Lab Notes: How We Made Joins 23 Thousand Times Faster, Part Three: I will show you how we modified our existing work to create a distributed block hash join algorithm...We identified two candidate approaches for this work: The locality approach; The modulo distribution approach...The modulo distribution approach reduces the total number of row reads because each node only has to compare its assigned subsets of left-hand and right-hand rows. Contrast this with the locality approach which still requires the hash join algorithm to compare against the whole right-hand table for every block.

  • Pop another stack on the stack. Building a Full Graph Stack: A full graph stack is focused on a GraphQL schema specification, possibly enhanced with directives...The GraphQL schema contains a graph of core types and their relationships. Apollo offers a set of services that greatly simplify building GraphQL interfaces. The portion of a GraphQL server specification that describes the data graph is called the TypeDefs. The TypeDefs are central to everything in the full graph stack. Your job is to create the TypeDefs to show your app graph...One or more converters can generate a full GraphQL server from your TypeDefs, and generate the resolvers needed to implement any queries or mutations (updates) on a back end database...My team had independently decided that we wanted to go with React, GraphQL, Apollo, and a graph database. The GRANDstack consists of all of these, and uses the package neo4j-graphql-js to generate (Prisma style) all of the mutations and instant resolvers from TypeDefs.

  • An update from space on the decentralized vs centalized architectures. Chris Hadfield in his Space Exploration Masterclass talks about the evolution of Mission Control in the space program. In the beginning it could have just been called Launch Control because that's all we really did. There was no central control at Houston. Astronauts were flown out to various parts of the globe so when they saw the capsule rise over the horizon they could talk to it. Over time launch control evolved into mission control as the mission became more ambitious and as technology improved. In the US once we had satellites Mission Control could be centralized in Houston. The Houston in "Houston, we have a problem" is Mission Control. These days of international cooperation there are multiple Mission Controls spread throughout the world. There's one in Russia, US, India, Japan, Canada, and Germany. As a spacecraft flies around the world they talk to each Mission Control. 

  • iamtrask/Grokking-Deep-Learning: early preview chapters of the book "Grokking Deep Learning"

  • prisma/prisma: a performant open-source GraphQL ORM-like layer doing the heavy lifting in your GraphQL server. It turns your database into a GraphQL API which can be consumed by your resolvers via GraphQL bindings.

  • Microsoft/dowhy:  a Python library that makes it easy to estimate causal effects. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

  • ballerina.io: Cloud Native Programming Language. Ballerina is a simple programming language whose syntax and platform address the hard problems of integration. Ballerina is a general purpose, concurrent, transactional, statically and strongly typed programming language with both textual and graphical syntaxes. Its specialization is integration - it brings fundamental concepts, ideas and tools of distributed system integration into the language and offers a type safe, concurrent environment to implement such applications. These include distributed transactions, reliable messaging, stream processing, workflows and container management platforms.

  • PHP 7.3: As of this writing, PHP 7.3 is in its Beta 2 phase, and according to the Preparation Tasks Timetable, it will be officially released in mid-December. It will bring us gifts like flexible heredocs and nowdocs, trailing commas in function calls, list() reference assignments and more.

  • SystemsApproach/book (pdf, epub, mobi): This site contains source text for Computer Networks: A Systems Approach, now available under terms of the Creative Commons (CC BY 4.0) license. The community is invited to contribute corrections, improvements, updates, and new material under the same terms.

  • Building the Space Elevator: Lessons from Biological Design: We draw inspiration from natural biological structures, such as bones, tendons and ligaments, which are made up of smaller substructures and exhibit self-repair, and suggest a design that requires structures to operate at significantly higher stress ratios, while maintaining reliability through a continuous repair mechanism. We outline a mathematical framework for analysing the reliability of structures with components exhibiting probabilistic rupture and repair that depend on their time-in-use (age). Further, we predict time-to-failure distributions for the overall structure. We then apply this framework to the space elevator and find that a high degree of reliability is achievable using currently existing materials, provided it operates at sufficiently high working stress ratios, sustained through an autonomous repair mechanism, implemented via, e.g., robots.

  • Review of "Concurrent Log-Structured Memory" from VLDB 2018": The paper is worth reading - the ideas are interesting and the performance results are thorough. Nibble is an example of index+log. Their focus is on huge many-core servers. I wonder how RocksDB would do a a server with 240 cores and many TB of DRAM. I assume there might be a few interesting performance problems to make better.

  • The Evolutionary Origins of Hierarchy: In computational simulations, we find that networks without a connection cost do not evolve to be hierarchical, even when the task has a hierarchical structure. However, with a connection cost, networks evolve to be both modular and hierarchical, and these networks exhibit higher overall performance and evolvability (i.e. faster adaptation to new environments). Additional analyses confirm that hierarchy independently improves adaptability after controlling for modularity. Overall, our results suggest that the same force–the cost of connections–promotes the evolution of both hierarchy and modularity, and that these properties are important drivers of network performance and adaptability. In addition to shedding light on the emergence of hierarchy across the many domains in which it appears, these findings will also accelerate future research into evolving more complex, intelligent computational brains in the fields of artificial intelligence and robotics.

  • Leveraging Elastic Demand for Forecasting: Demand variance can result in a mismatch between planned supply and actual demand. Demand shaping strategies such as pricing can be used to shift elastic demand to reduce the imbalance. In this work, we propose to consider elastic demand in the forecasting phase. We present a method to reallocate the historical elastic demand to reduce variance, thus making forecasting and supply planning more effective.

  • Construction of integrated gene logic-chip: Here, we have made an orthogonal self-contained device by integrating an actuator and sensors onto a DNA origami-based nanochip that contains an enzyme, T7 RNA polymerase (RNAP) and multiple target-gene substrates. This gene nanochip orthogonally transcribes its own genes, and the nano-layout ability of DNA origami allows us to rationally design gene expression levels by controlling the intermolecular distances between the enzyme and the target genes. We further integrated reprogrammable logic gates so that the nanochip responds to water-in-oil droplets and computes their small RNA (miRNA) profiles, which demonstrates that the nanochip can function as a gene logic-chip.

  • Mind Your State for Your State of Mind: This "mind your state for your state of mind" article looks at the history of interactions of applications and storage/databases, and charts their co-evolution as they move into the distributed and scalable world.