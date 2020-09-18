Hey, it's HighScalability time!

1/8th: failure rate for Microsoft's underwater data center—after two years—when compared to conventional data centers. Only 8 out of 855 servers failed. Why? No humans to mess things up? Shielding from cosmic rays? Nitrogen atmosphere? Cooler?

100,000+: requests per second handled by Shopify.

22.2%: US electricity supplied by renewable energy.

40,000: oldest technical system ever built by humans is a series of fish traps built in Australia. Last use was 1915.

~330M: traces and ~8.5B spans per day at Slack.

~60%: of organizations run a mix of SQL and NoSQL databases. Only 14% of organizations run exclusively NoSQL databases.

$1 million: not tempting enough to hack Tesla. Other companies were less fortunate.

0.81%: BackBlaze's Annualized Failure Rate (AFR) for Q2 2020. Q1 2020 which was 1.07% One year ago (Q2 2019), the quarterly AFR was 1.8%. Three drive models had 0 drive failures: the Toshiba 4TB, the Seagate 6TB and the HGST 8TB.

22%: ecommerce penetration in the US. It was 17% a few month ago. 5 years of growth in three months.

5%: of world's websites hosted on Wix. 700 million uniques per month.

43%-67%: faster SQL Server backup by writing to multiple files.

0: score for the meat puppet F-16 fighter pilot against an AI. DeepMind becomes DeepDeath.

$720 billion: wasted on failed IT replacement efforts.

$734.38: Joe Emison's itemized monthly bill for a full-stack insurance company running on serverless.

11 to 60Mbps: Starlink dowload speeds. Ping ranges from 31ms to 94ms. Not bad, but not stratospheric either.

£200m: British companies paid in ransomware last year.

10x: more outages by ISPs when compared to cloud providers.

50x: chip cooling improvement over typical microchannel cooling approaches. It uses an optimized 3D structure to extract heat before it propagates, rather than wait as is done with a heat sink. By increasing the heat flux that can be managed, many more devices can be integrated on a chip. Not only that, we can start having integrated power chips. This is a new thing. It could have an impact similar to that of silicon microchips.

Ian Banks: “It happens.” [being hacked] Hippinse sighed. “Not to Culture ships, as a rule; they write their own individual OS as they grow up, so it’s like every human in a population being slightly different, almost their own individual species despite appearances; bugs can’t spread."

@TimSweeneyEpic: Two facts about Apple: 1) Apple is #3 in the world in game revenue 2) Apple doesn’t make games

@RituFM: F.R.I.E.N.D.S. OF PRODUCT 1. Sales: Joey 2. Marketing: Rachel 3. CEO: Monica 4. Engineering: Ross 5. Support: Chandler 6. Rest: Phoebe Me: Paul Rudd ;)

iRobot CEO: Thinking that autonomy was the destination was where I was just completely wrong

@raganwald: “Why are you asking for this intellectual property and non-compete agreement?” We don’t want you taking what you learned here, somewhere else. “Why are you offering me a job in the first place?” We want to take advantage of what you learned somewhere else. “Siri, playback.”

tehlike: As an engineer that worked both at google and Facebook, I vastly prefer googles monorepo on perforce. Combining that with citc was pretty solid way to develop your project. Hg is a bit of a nightmare in the wfh situation. Really slow, hangs for a long time if you haven't synced in a few days. Yes, im sure there are ways to tweak, but not sure if you can tweak them enough!

45nshukla: Moving 25TB data from one S3 bucket to another took 7 engineers, 4 parallel sessions each and 2 full days

redm: I really love this offering and I don't think it gets enough attention. We are an on-prem company and we use CloudFlare. Our users pay for that latency (in time) for us to traverse our IP providers to get to a CF pop. Since all our traffic goes over CF, directly connecting makes a lot more sense. I'm going to investigate further for the latency benefits. I've also backhauled lots of IP over the years and it can be a real pain. Fiber cuts are common, keeping redundant wave service or dark fiber drives up the cost, and in the end, its often cheaper to hand off to an IP providers meshed network, then to backhaul any distance for latency.

Evan Ackerman: iRobot is announcing a major new software update that represents a significant shift of its overall approach to home robot autonomy. Humans are being brought back into the loop through software that tries to learn when, where, and how you clean so that your Roomba can adapt itself to your life rather than the other way around.

SpectralCoding: The correct answer is S3 Batch Operations, using the PUT object copy functionality. We had to move a large on-premise backup destination bucket from us-east-1 to us-east-2. It resulted in my post here Cross Region S3 Transfer Speed 50Gb/s? (moving 122TB in about 5 hours).

manigandham: GCP and Azure both have much better built-in tooling that would make this a few clicks. Their storage system design is also much better. It's unfortunate that the industry has standardized around S3 just because it's a first mover rather than pushing Amazon's product to get better.

pythonpoole: A key difference is that Cosmos DB is designed so that it can be effectively used as a drop-in replacement for a traditional relational database supporting SQL queries whereas DynamoDB is not designed this way. DynamoDB does not support SQL-like queries. DynamoDB instead has a proprietary API which is not designed to work with relational data. While secondary indexes are supported, it's not like Cosmos DB where all properties are auto-indexed. Ultimately, you won't get the same degree of query flexibility that you get with CosmosDB. With DynamoDB you generally first have to organize your data in a way that is optimized for the types of queries you want to perform.

shakezula: I used to work in the music industry professionally, on the ground level doing booking and management. This trend has been happening slowly for nearly a decade but it’s finally here. Rap and Hip Hop figured out a long time before most other genres that rapid small releases was a far better way to keep hype and sales up. Before Spotify was a thing, the shift was happening with YouTube but it wasn’t as predominant. Now it’s basically assumed you’ll be releasing singles every month. The music isn’t your product, the music is your marketing. The shows, the merch, your influence - that’s your product.

zelly: Innovation happens on the factory floor. Pretty soon the countries we outsourced to will come up with better designs too. The U.S. thesis for outsourcing is that the manufacturing countries are filled with braindead automatons who can't compete with our "Designed in California". That may have been true in the 20th century when the U.S. brain drained all the top talent, but now other nations are in a position to pay their engineers more than U.S. companies. The "Designed in California" cope can only last so long.

@Rainmaker1973: NASA only uses 15 digits of π for calculating interplanetary travel. At 40 digits, you could calculate the circumference of a circle the size of the visible universe with an accuracy that'd fall off by less than the diameter of a single hydrogen atom

@_ericelliott: 2 months into TDD: Tests are hard to write & brittle. 2 years in: Tests taught me better code patterns, reduce bugs 40%-80% and eliminate fear of change. 10 years in: TDD changed my life.

Rich Miller: Consultant and futurist Chetan Sharma projects the edge economy will reach $4.1 trillion by 2030. The State of the Edge 2020 report from the Linux Foundation projects that edge investment will accelerate after 2024, with the deployed global power footprint of edge IT and data center facilities forecast to reach 102,000 megawatts by 2028, with annual capital expenditures of $146 billion.

Slack: To address these limitations and to easily enable querying raw trace data, we model our traces at Slack as Causal Graphs, which is a Directed Acyclic graph of a new data structure called a SpanEvent.

gcommer: tl;dr of confidential computing: In normal cloud computing you are effectively trusting the cloud provider not to look at or modify your code and data. Confidential computing uses built in CPU features to prevent anyone from seeing what is going on in (a few cores of) the CPU (and in EPYC's case, encrypt all RAM accesses). Very roughly: These CPU mechanisms include the ability to provide a digital signature of the current state of the CPU and memory, signed by private keys baked into the CPU by the manufacturer. The CPU only emits this signature when in the special "secure mode", so if you receive the signature and validate it you know the exact state of the machine being run by the CPU in secure mode. You can, for example: start a minimal bootloader, remotely validate it is running securely, and only then send it a key over the network to decrypt your proprietary code. Effectively, it increases your trust in the cloud from P(cloud provider is screwing me over) to P((cloud provider AND CPU manufacturer are both working together to screw me over) ∪ (cloud provider has found and is exploiting a vulnerability in the CPU)).

rkangel: We mostly use C rather than C++, but the same two big reasons get in the way of using Rust for everything:* Compiler support * Availability of suitable engineers

Mattman: Owner insisted on 0 downtime. Moved the server 700 feet on a cart with 2 UPSs and a chain of (3)gigabit switches. Should have been a 5 minute job if done correctly. Owner ended up paying for over 10 hours of work.

@Carnage4Life: Then iOS happened. It turns out what actually wins is user experience not openness. We all got it wrong.

@ben11kehoe: Today, people's code is local, and so they think "how can I move the cloud down here to test with my local code?" I believe the question should be "how can my local dev environment be better and more quickly manifested in the cloud?"

Mikael Ronstrom: Just a fun image from running a benchmark in the Oracle Cloud. The image above shows 6 hours of benchmark run in a data node on a Bare Metal Server. First creating the disk data tablespaces, next loading the data and finally running the benchmark. During loading the network was loaded to 1.8 GByte per second, the disks was writing 4 Gbyte per second. During the benchmark run the disks was writing 5 GByte per second in addition to reading 1.5 Gbyte per second. All this while CPUs were never loaded to more than 20 percent.

donatj: We actually restructured our entire product to win the sale of a very large customer who’s users didn’t fit perfectly into our metaphor. It was unwieldy and we basically rolled the whole thing back several years later

@etherealmind: Chuck Robbins, Cisco CEO: "I think this pandemic is basically just – it's just giving us the air cover to accelerate the transition of R&D expense into cloud security, cloud collab, away from the on-prem aspects of the portfolio." Router huggers won't be happy.

@jvthing: Our entire platform at @thingco is built on one DynamoDB table and lots of Go lambda functions! Perfect combo for speed, stability and cost

Business Insider: She earns money by placing ads on her YouTube channel, and promoting products on her Instagram page (176,000 followers) and her podcast "Thick & Thin." On average, Bellotte earns between $2,400 and $5,000 for a sponsored Instagram post, she said. For an Instagram Story slide, she asks for $500 per frame.Oct 15, 2019

@nathankpeck: My DynamoDB usage last quarter: - 1 TB of data, >10 billion rows - 1 billion on demand read units per month - 200 million on demand write units per month - Consistent 2.5ms query latency the entire time - Cost is about $6k per month. Having such a worry free DB is priceless

@swardley: ... the best way to think of China Gov is the world's largest venture capital firm but a really good one with high levels of situational awareness and gameplay. It's tough for US policy makers to cope with because it doesn't fit into a more US view of economics.

@emollick: Over 2 billion years ago, 17 nuclear reactors started up in Gabon, Africa. They ran for almost a million years, off & on, producing 100 kilowatts of power at a time. They were entirely natural, because uranium was common.

@mipsytipsy: Metrics scale up linearly in terms of write amplification, storage and cost. Double the metrics you capture and store, and you've doubled your bill. This sucks because the only way you can really ask new questions using metrics is by defining new custom metrics upfront.

@sheeshee: today we released our new #kubernetes clusters based on #rancher on bare metal. \o/ :) sdn is #calico, new ci/cd/workflows integrated the #argo family. all metrics goes to our new tsdb #victoriametrics. storage from our new #ceph cluster. 15/10 would do own infra again. :)

@addisonsnell: A one-slide summary of @Google #TPU v3 over v2: 4x nodes; 2x matrix-multiply;+30% freq; +30% memory bw; 2x HBM memory capacity; +30% interconnect bw #HotChips2020 #AI

@chrismcatackney: 8 year old just experienced his first serious production issue. He created a block in Minecraft that started spawning dragons in an infinite loop. He ran in crying that all the megabytes were getting used up and the computer would go on fire. Welcome to software, son.

@jpietsch: At Amazon we built very large datacenter networks with OSPF almost decade ago. After leaving last year, I've learned from the industry that you can't do that with OSPF. hmmm.

@phaeria: Even Amazon has outages, this morning AWS EC2 in London region was down. One of our clients was affected, the system was built to be scalable so we redeploy & went live in 30 minutes. Instead of 2 hours of potential downtime.

@sbyrnes: An interesting observation about Google is that it succeeded in eating a lot of the Web value chain. In comparison, Facebook has slowly devoured the social media ecosystem until it was the only one left. It hasn't succeeded in branching out to other parts of the value chain. As a result, Facebook finds itself very vulnerable to regulation and likely cannot simply acquire their competition as they have in the past. For now they scale based on how much their social networks scale, but eventually are going to be vulnerable to new competition (TikTok).

@mathiasverraes: "Monolith" is the word you use when you want to blame the brokenness of your system on its size, instead of on 15 years of bad development practices.

@RosaCtrl: ”So why does NTP's support hinge so much on the shaky finances of one 59-year-old developer?” Because open source lacks political motivation besides free labour

Ryan Warrender: Relapse is real. The majority of the companies we worked with saw significant improvements in site speed and/or user engagement. However, 30–60 days post consultation (when we were no longing looking over their shoulder) we would see bad habits resurface. To avoid this pitfall, use a performance budget.

readingnews: As a systems admin of some 30 years, I have read this over and over, and even seen it firsthand. I think the real root cause of (at least in my case, the ones I have seen) of this problem is management. Leaders do not want to spend now. Leaders do not want to make a progressive plan to keep up with technology and move ahead. It costs money, and is very difficult to talk to upper-upper management about. Telling the boss to tell his boss to spend money on something "that works now" is very difficult. Few want to invest in the future when it costs now and is working now (unless its the stock market).

Andreas Zwinkau: Software that failed based on the phase of the Moon at CERN: “A few desperate engineers discovered the truth; the error turned out to be the result of a tiny change in the geometry of the 27km circumference ring, physically caused by the deformation of the Earth by the passage of the Moon!”

@BrianRoemmele: Amazon said it had received more than 3,000 requests for smart speaker user data from police earlier this year. Amazon complied with the police's requests on more than 2,000 occasions. This number marks a 72% increase from the same period in 2016—up 24% year over year.

rubiquity: Putting hubris aside, I think it's great that a decade or so into large scale computing we're starting to see patterns emerge for scaling stateful systems and be able to build good generic solutions to them. This is sorely needed especially on the control plane side which historically hasn't gotten the attention that data planes have.

@SteveSmith_81: In the past 3 years I've worked with 3 different clients using #kubernetes: Kops + AWS and GKE. In all 3 cases k8s has successfully managed workloads at scale, yet I wouldn't recommend it to any future clients. The upfront and ongoing investment is astonishing

@mweagle: “Looking back from 2020, Go has succeeded in both ways: it is widely used both inside and outside Google, and its approaches to network concurrency and software engineering have had a noticeable effect on other languages and their tools.”

msadowski: In the two years working as a Robotics Consultant, I’ve noticed a pattern: the more a potential client bargains before starting the job, the more bargaining and complaining will follow, resulting in an unpleasant experience for everyone involved.

Jana Lyengar: So where does this leave us in terms of final QUIC and HTTP/3 deployment in the world? I’ll venture to make a few predictions; note that standard disclaimers apply about any such forecasting. Looking at the landscape, I expect that we will see rapidly increased rollouts of QUIC and HTTP/3 by clients this year, as well as higher volume testing on pre-release channels first, followed eventually by clients turning QUIC and HTTP/3 on in their stable releases. Going a step further, I believe that QUIC and HTTP/3 will become the de-facto mainstream web protocol stack in 2021.

Zdenek Prikryl: China is really active right now. In fact, it is the most active territory at the moment. With RISC-V, we see a lot of traction at the universities and in the companies. Pretty much every company has some kind of RISC-V strategy. Either they have adopted RISC-V already, or plan to do that quite soon. The next one from a geographical point of view is North America. The U.S. is quite active. You can see startups working with RISC-V in AI domains because you need to have some kind of customization, and RISC-V is very well positioned for that.

Science Daily: "It's not the first step of attributing a mind to an android but the next step of 'dehumanizing' it by subtracting the idea of it having a mind that leads to the uncanny valley. Instead of just a one-shot process, it's a dynamic one."

@DrQz: For reasons known only to those who use it, the term "latency" has become regurgitated ad nauseam. In performance analysis it is a useless generic word. Successful analysis requires that it immediately be decomposed into service time, waiting time, etc.

@migueldeicaza: I bet Fortnite could work in Safari without going through the AppStore. Like Confucius famously said in 500 BC: “When there is a billion dollar budget there is a way to compile the code to WebAssembly” Socrates famously retorted “the bleeding can end when the man drops his knife and stops stabbing thyself” To this day, philosophers debate whether “the man” in the quote refers to an adult or a precocious toddler. “The Republic” left enough room for interpretation.

@vgill: But the perception still sticks. I remember a few years ago a super smart VC claimed in front of 30+ people that AWS had reduced pricing 40+ times while HP and Dell had not. The entire crowd was nodding along till I mentioned that if you looked at the IOPs, storage, and CPU AWS was much further behind on price cutting you would get based on memory and Moore's observation. Awkward. An entire set of ostensibly smart decision makers hand-picked to fly to Hawaii and thought leader could not draw the lines on something this simple.

@jordannovet: Snowflake, which offers cloud-based data warehousing software, is worth about $70 billion. Teradata, which sells traditional data warehousing hardware and software, is worth $2.5 billion

@jks: I remember a database professor telling us how he had the opportunity to ask an airline systems programmer how they do distributed locking for seat allocation. "Sometimes two people get the same seat, then we might have to bump or upgrade one."

@emilyst: "your API is writing checks your program can't cache"

Jeffrey Burt: Alibaba in July introduced its first RISC-V-based product, the XT910 (the XT stands for Xuantie, which is a heavy sword made using dark iron), a 16-core design that runs between 2.0 GHz and 2.5 GHz etched in 12 nanometer processes and that includes 16-bit instructions. Alibaba claims the XT910 is the most powerful RISC-V processor to date. The company spoke more about the processor at this week’s virtual Hot Chips 2020 conference, giving an overview of the processor, an idea of how it stacks up to Arm’s Cortex-A73

Pat George: At some point we realized one’s ability to solve the Boggle challenge didn’t correspond to the person’s success here. After thinking about why there seemed to be no correlation we determined that of all the things the algorithm-type questions can tell you about a candidate, we cared about slightly different things. Additionally we decided that if none of us liked doing them during our own interviews, why would we subject our future colleagues to them?

Qnovo: The combination of new cathode materials with silicon-graphite composite anodes promise to deliver energy densities around 900 ~ 1,000 Wh/l. Yet, the vast majority of lithium ion batteries continue to ship today with graphite anodes highlighting the difficulties and the long durations needed for bringing new materials to market.

Neflix: Logs, metrics, and traces are the three pillars of observability. Metrics communicate what’s happening on a macro scale, traces illustrate the ecosystem of an isolated request, and the logs provide a detail-rich snapshot into what happened within a service.

Ed Sperling: For high-performance applications, chips are being designed based upon much more limited data movement and near-memory computing. This can be seen in floor plans where I/Os are on the perimeter of the chip rather than in the center, an approach that will increase performance by reducing the distance that data needs to travel, and consequently lower the overall power consumption....Scaling of digital logic will continue beyond 3nm using high-NA EUV, a variety of gate-all-around FETs (CFETs, nanosheet/nanowire FETs), and carbon nanotube devices...Designs are becoming both more modular and more heterogeneous, setting the stage for more customization and faster time to market. All of the major foundries and OSATs are now endorsing a chiplet strategy

Sabine Hossenfelder: This path-dependence is also why magnets can be used to store information. Path-dependence basically means that the system has a memory.

Memory Guy: In fact, if you estimate that Intel’s NSG group’s NAND profit was equal to the average of its competitors, then you can calculate XPoint losses of about $2 billion for 2017, another $2 billion for 2018, and $1.5 billion for 2019!

@MaxEpstein5: My least favorite part of programming is that triumphant feeling during debugging of "FINALLY found the bug" followed by the soul-crushing "oh wait that's not THE bug, that's just a(nother) bug"

Mark Callaghan: The paper is about one of my favorite topics, efficient performance, where the goal is to get good enough performance and then optimize for efficiency. In this case the goal is to use mostly lower-perf, lower cost storage (QLC NAND flash) with some higher-perf, higher-cost storage (NVM, SLC/TLC NAND). The simple solution to use NVM for the top levels (L0, L1), TLC for the middle levels and QLC for the max level of the LSM tree. Alas, that doesn't always work out great as the paper shows.

@dotpem: We're experimenting with the new Gravitron instance types at @honeycombio and they're saving us an ARM and a leg.

@tmclaughbos: How long has this idea of treating your entire cloud infrastructure as a monolith been around? I'm shocked at the number of times I've talked to people that use a cloudformation stack with many many nested stacks to manage their entire AWS infrastructure and applications in it.

@davidcrawshaw: Use of Reed-Solomon error correcting codes in COVID-19 pooled testing. Each sample is assigned to more than one pool, and if positivity is low enough the result is 10x throughput.

jeffffff: yeah i've learned the hard way not to give a customer an sla without a rate limit built into it

Bruce Dawson: If everyone on a project spends all of their time heads-down working on the features and known bugs then there are probably some easy bugs hiding in plain sight. Take some time to look through the logs, clean up compiler warnings (although, really, if you have compiler warnings you need to rethink your life choices), and spend a few minutes running a profiler. Extra points if you add custom logging, enable some new warnings, or use a profiler that nobody else does.

Frank Schirrmeister: Where is all of this going? Networks will have to become faster, storage latencies will have to go down and storage volumes will have to go up. Compute-domain specificity will increase even more. The Next Platform’s Timothy Prickett Morgan’s discussion with NVIDIA’s Jensen Huang nicely illustrates how the transformation of the data center goes well beyond just changes in the semiconductor industry design chain and the dynamics in the processor ecosystems and Tier 1 hyperscale companies who are doing their own chip design. The data center will become fully programmable. Design for power efficiency, thermal optimization and integrating multiple chiplets using 3D-IC technologies will be key enabling technologies, worthy of their own future blog.

trevor-e: There was a three-week gap since the last Xcode update and Apple is notorious for breaking stuff between Betas. We were given a single day to get everything set up (CI/CD, signing certificates, provisioning profiles, etc), update our codebase (yes there were source changes made in the GM update), and test that everything works (three weeks worth of Xcode changes), so no it's not really how you picture it to be. Having iOS14 features ready to be in sync with the iOS14 launch is pretty crucial for apps.

Atlantic Council Report: Software supply chain security remains an under-appreciated domain of national security policymaking. Working to improve the security of software supporting private sector enterprise as well as sensitive Defense and Intelligence organizations requires more coherent policy response together industry and open source communities. This report profiles 115 attacks and disclosures against the software supply chain from the past decade to highlight the need for action and presents recommendations to both raise the cost of these attacks and limit their harm.

@mountain_ghosts: I love how bank transfers are the canonical DB transaction example when banks take days to execute them and allow dirty reads the entire time.

@shreyas: Don’t hide behind the data. Don’t wait for it to tell you what to do. You can actually validate promising ideas to death. At some point you have to go with it. Put an MVE or MVP out there. Risk something, but not much. Then measure and take the next decision.

jwr: Back when I worked at a supercomputing center, we had "operators" on duty, who were supposed to visit the machine room every 2-3h or so and check several things. It turned out that they were the major cause of hangs and reboots of our SunSITE server (a large FTP archive) — walking on the lifted datacenter floor caused vibrations which were enough to disturb the (terrible) external SCSI connectors to multiple drive arrays.

pier25: If it was closer to 5%, or even a fixed fee (eg: $1 per app sold), I would accept the narrative that it's a fee. 30% of your business is not a fee, it's more like a partnership. Apple is, in practice, a business partner to each and every iOS developer. Except that they have total and absolute control of the business. If they shut you down on the App Store, you're done. Your iOS app is worthless on any other platform. I shit you not, I've personally had apps rejected by the review board because they didn't like the screenshots. Those were screenshots of the app itself.

teh_klev: I speak as someone who's worked for 30+ years on data modelling. Every time I encounter some mongo or other non-relational DB where the company jewels (the data) are stored with no documentation, no data model etc and stuff is just shoved into these stores willy nilly it makes me weep. Start off with relational, if perf is a problem then look at denormalising, after that then consider other alternatives for special cases. But to see run-of-the-mill apps with no near future scalability issues jumping right into mongo et al from day one makes me want to run away.

mojomark: You're correct, as a marine engineer, I can tell you that the biofouling problem is far from solved. To date, copper biocide works best, but is terrible for the environment. A lot of the new coatings are also 'speed-release', meaning the hull must travel a certain speed in the water before the biofouling simply falls off. Obviously for a hull that site stagnant in the water, like Natick, this type of coating won't help. Many people have tried to solve the issue passivly, and even tried mimicking shark skin (tiny chevron-shaped scales) at the microscale to minimized biological growth's ability to "latch" to the material surface. However, I haven't seen any good commercially viable progress in practice.

distantskeptic: Open source CPU design is fundamentally different than open source software design. In the latter costs are extremely low - just the cost of a computer per developer. That developer's computer need not be replaced for years. There is no significant incremental cost for a software bug - just recompile and in a few minutes you're off to the races with a new executable which can be distributed over the internet for next to nothing. Contrast that with CPU design - every time a hardware bug is found you'd have to fix the design, verify it in software simulations, fabricate a new wafer, package it, install it in test hardware, and then perform hardware verification. This is 5 to 6 orders of magnitude slower and more expensive than software. Sure, corporations can perform this open CPU design, verification and manufacturing function. But in the end for a CPU to have a certain level of speed and reliability, you'd have to spend at least the same amount of money as the commercial CPU makers. Companies that produce an open source CPU chip are incurring huge monetary risks - and would have to be compensated for this risk if their chips have bugs and cannot be sold. The only way for an open source chip design to be remotely competitive would be if they were to embrace FPGA technology. But FPGAs run 4 times slower than purpose built ASICs and are at least 10 times more expensive per unit in volume.

snuxoll: BGP is a path-vector routing protocol, every router on the internet is constantly updating its routing tables based on paths provided by its peers to get the shortest distance to an advertised prefix. When a new route is announced it takes time to propagate through the network and for all routers in the chain to “converge” into a single coherent view. If this is indeed a reconvergence event, that would imply there’s been a cascade of route table updates that have been making their way through CTL/L3’s network - meaning many routers are missing the “correct” paths to prefixes and traffic is not going where it is supposed to, either getting stuck in a routing loop or just going to /dev/null because the next hop isn’t available. This wouldn’t be such a huge issue if downstream systems could shut down their BGP sessions with CTL and have traffic come in via other routes, but doing so is not resulting in the announcements being pulled from the Level 3 AS - something usually reflective of the CPU on the routers being overloaded processing route table updates or an issue with the BGP communication between them. Convergence time is a known bugbear of BGP.

Tweedy et al.: Migration of cells through tissues and embryos is often steered by gradients of attractive chemicals in a process called chemotaxis. Cells are best at navigating complex routes, for which they use “self-generated chemotaxis” and create their own attractant gradients. An example of this is when neutrophils migrate into tissues to attack infection. Using modeling and live-cell data, Tweedy et al. found that self-generated chemotaxis allows cells to obtain surprising amounts of information about their environment. Cells of the slime mold Dictyostelium discoideum and mouse pancreatic cancer–derived cells were able to use the diffusion of attractants to identify the best route through complex mazes, even when the correct path was long and twisted, without ever entering incorrect paths.

Justin Pietsch: learned how to think about scaling networks and how critical it is to make things simpler. How to think about tradeoffs around magic abstractions and understandability. Infrastructure that is magic is often too good to be true, at least when you are scaling and growing very quickly. It requires deep introspection, understanding of what happens under failure, and some great monitoring...Running out of capacity on a shared resource is about the worst sin you can perform in a network. And we [Amazon] ran out of capacity a lot.

@__steele: Lambda continues to impress me. I was about to throw away some broken data, but then I decided I against it. I spent 40 minutes writing some code, uploaded it, pointed it at a bucket and hit go. It downloaded, processed and reuploaded ~600GB in under two minutes. 4,000 files.

jrockway: I think relational databases have largely failed developers because they don't provide the features they actually need. A common question that comes up is how to do zero-downtime schema changes. The answer is that there isn't one. A correct implementation would store each schema version in the database and when an application connects, it would specify which version it's speaking. The developer would supply a mapping on how to make vX data available to a vY program. But no relational database supports such a feature, so people are forced to tread carefully -- look at all the deployment software that exists to attempt to find changes with database migrations and treat them differently. Look at all the software people have written to even apply those migrations. It's staggering, all because in the 70s when these systems were designed, the thought of deploying your code multiple times a day was unheard of. Another problem that comes up is transactional isolation. Most engineers, and even casual practitioners, "know" that transactions exist for cases where you want to perform multiple operations atomically. But very few of these people are running the transaction with an isolation level that provides those guarantees.

Kevin Mitchell: In short, brain is not like muscle. Bits of brain don’t just grow with experience – they mainly change by reorganising their internal connectivity. This is just as well because if the brain did continue to grow with use, all of our brains would be busting out of our skulls. I don’t mean to be too sarcastic (just the right amount), but I’ve been going around seeing things like crazy – really intensely using my visual system for many years now – without causing massive growth of my visual cortex.

dbartholomae/lambda-middleware: a collection of middleware for AWS lambda functions.



scality/elmerfs: a filesystem that leverage Conflict Free Replicated Data Type CRDT on top of AntidoteDB to be eventually consistent in a active-active geo distributed scenario.



jakkra/Mars-Rover. Build your own rover.

