Stuff The Internet Says On Scalability For June 8th, 2018

Hey, it's HighScalability time:

Slovenia. A gorgeous place to break your leg. Highly recommended.

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

  • 294: slides in Internet trends 2018 deck; 110 TB: Hubble Space Telescope data; $124 million: daily App store purchases; 10 billion: monthly Siri requests; 1000 billion: yearly photos taken on iOS; one exabyte: Backblaze storage by year end; 837 million: spam taken down by Facebook in Q1; 86%: of passwords are terrible; 10 Million: US patents; 72: Transceiver Radar Chip; C: most energy efficient language; $138 Billion: global games market; $50 billion: 2017 Angry Birds revenue; 50 million: cesium atomic clock time source; 1.3 Million: vCPU grid on AWS at $30,000 per hour; $296 million: Fortnite April revenue; 4000: Siri requests per second; 

  • Quotable Quotes:
    • @adriancolyer: Microsoft's ServiceFabric runs across over 160K machines and 2.5M cores, powering core Azure services as well as end-user applications. 
    • @StabbyCutyou: To summarize: Tech is not easy, but people are always harder. Until the robots come to replace us all, you won't be able to avoid dealing with people.
      Focus on the above skills, and "technology harder" as a way to increase the scope of where you can be useful to other engineers.
    • Mani Doraisamy: I would rather optimize my code than fundraise.
    • Backblaze: The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.
    • @rbranson: Things I once held as sincere beliefs: * Databases are bad queue backends * Static typing is a waste of time * Monoliths are always worse * Strong consistency isn't worth fighting for * Using the "right" tool always trumps using the one you know * The JVM sucks
    • @aleksbu: Since I switched back to SSR for my product, productivity went through the roof. SPAs has been a fun journey for me but all this added complexity makes it super expensive. My product doesn’t need any of the benefits that SPAs bring at the moment so switching to SSR was natural.
    • @ajaynairthinks: Real car convo with my 5YO: "V:Papa What's a Lambda" [after hearing my call] Me: Well its a way to run code without servers V: What's code Me: Its like algorithms [from a game he plays] V: Ohh so Lambda is cool because I know what code is and because I dunno what a server is Me: [Yes!]
    • @OpenAI: AI and Compute: Our analysis showing that the amount of compute used in the largest AI training runs has had a doubling period of 3.5 months since 2012 (net increase of 300,000x)
    • @JeffDean: TPUv3!  So hot it needs help with cooling: first liquid cooled devices in our data centers. A TPUv3 pod is 8X as powerful as the TPUv2 pod announced at IO last year, offering more than 100 petaflops of ML compute, allowing us to tackle bigger problems & build better products.
    • Platformonomics: As I keep repeating, CAPEX is both a prerequisite to play in the big boy cloud and confirmation of customer success. Both IBM and Oracle are tens of billions of dollars in cloud infrastructure CAPEX behind Amazon, Google, and Microsoft.
    • Eugene Wei: I believe the core experience of Twitter has reached most everyone in the world who likes it. Let's examine the core attributes of Twitter the product 
    • Harrison Jacobs: The joke used to be that Chinese people like to live near good public schools, Liyan Chen, the manager of international corporate affairs at Alibaba, told Business Insider. “The joke now in China is that they want to live where the Hemas are, because then they can get everything delivered to them really easily.”...The technological advancements Alibaba has brought to Hema — easy in-app ordering, ultrafast delivery, price matching, facial-recognition payment, tailored stocking based on spending habits, etc. — Amazon could easily bring to Whole Foods. And in my opinion, given Amazon's obsession with efficiency, it's a matter of not if, but when.
    • @lizthegrey: Some uses of ML today in Google production: predicting user clicks on ads, prefetching next memory/file accesses, scheduling jobs and capacity planning, speech recognition, fraud detections, smart responses, and machine vision. #SREcon
    • @lizthegrey: What's ML look like in prod? "Don't worry, it's just another data pipeline" 10% of the effort is offline training -- transforming/training prod data using TPUs, validating, and producing a trained model. We then push the trained model to prod. #SREcon
    • Joab Jackson: [Facebook] developed a tool, called Packager, which uses machine learning to automate the process of deciding which files to bundle into a package for a specific end user. It relies heavily on statistical analysis: Which files will the users need right away? Which will they need eventually? Which files have been updated? Some files get updated constantly; others not so much so.
    • @Nick_Craver: If you're an engineer, programmer, whatever or hoping to be one. Don't listen to these claims. Large projects always have tons of mistakes in them. Minimize them. Learn from them. Try not to repeat them. Move on. Remember to laugh at them later. It's part of life. Don't sweat it.
    • Mikah Sargent: The new [Apple] build system, which is completely written in Swift, is now on for all projects. It uses 20% less memory, results in two-times faster rebuilds, and reduces code size by up to 30%
    • Martin Thompson: In the IPC space, we have some of the best latency and throughput of anything out there, depending on whether you have contended publication or not. It’s possible to get in the high tens of millions of small messages per second, running over Aeron over various transports.
    • @PaulSalopek: All movement is worship: A remarkable book geo-tags the vast migrations of animals. A tern, weighing less than a cup of water, flies 40K miles. A turtle paddles 7K miles across the Atlantic. An elephant paces off a territory the size of Belgium. #EdenWalk
    • @viktorklang: The primary reason why #Scala Futures are not Cancellable is because that makes them not freely sharable. Promises are cancellable. Alice: creates Promise `p` with intent to produce a value. Bob: obtains a reference to `p` from Alice, Bob is now able to complete it Alice: can during the production of the value of `p` interrogate whether `p` has already been completed, and cancel the production of the value. Thus, Promise is a "permission to write a value"  and Future is a "permission to read a value".
    • Rick Hoskinson (LoL): You can make bold changes to a game in large-scale release, but you need to be able to roll out systemic replacements in parallel with the legacy tech. 
    • @xleem: overall in China internet connectivity failures happen around 30 times/month. 20-30 times in the local ISP network for 16-mins on average; 0-1 times the backbone network will fail averaging 2h40min; 12-20 times a month the datacenter PoP network will fail, 1h10min. #SREcon
    • Joe Emison: I think Kubernetes is ultimately sort of dead technology walking, just like mainframes, but it gets people moving in the right direction.
    • camhart: I've used both azure functions and aws lambda in production environments. Azure functions feel rushed out the door with gotchas/problems around every corner, including major stability issues. Azure functions are mid transition between v1 and v2, with v1 becoming outdated with nuget version lockin and cluttered with gotchas, but v2 is plagued with stability problems and breaking changes happening every other month. Aws lambdas have had more refinement done on them. For the time being i wouldn't recommend azure functions unless there's non technical motivations.
    • @mipsytipsy: f*ck logs.  f*ck metrics. my life got easy when i stopped peering at dashboards and trying to correlate timestamps between systems or remember which weird string i was supposed to be grepping on. all you need is events and aggregated tracing.  🎵
    • @MalwareJake: If you talk about "how stupid Intel's engineers were a decade ago" re:CPU flaws on social media without an engineering background, I think you should be banned from Twitter until you read the Intel assembly manuals cover to cover. Either that or some similarly ironic punishment.
    • How to Change Your Mind: This was the brain’s “default mode,” the network of brain structures that light up with activity when there are no demands on our attention and we have no mental task to perform. Put another way, Raichle had discovered the place where our minds go to wander—to daydream, ruminate, travel in time, reflect on ourselves, and worry. It may be through these very structures that the stream of our consciousness flows.
    • Timothy Prickett Morgan: the key message that new libraries called ExaFMM and HiCMA gives researchers the ability to operate on billion by billion matrices using machines containing only gigabytes of memory, which gives scientists a rather extraordinary new ability to run on really big data problems.
    • Simon Segars: If you go back a few years, everyone you spoke to in the chip industry was wondering what comes next. They had gone through many years where mobile was driving everything and there was tons of design work to do, and then mobile growth rates started flattening and everyone was pulling their hair out, wondering what is the next big thing. Now we have so many next big things it’s hard to know where to start. There are new communications protocols, whether it’s 5G, LoRA, Narrowband IoT, and new technologies which in themselves require a lot of innovation in semiconductor devices. You’ve got the world of AI driving chips in the cloud. There is inferencing at the edge, which is driving innovation in designs that eventually will underpin all of these technologies. The cloud itself is exploding, and there seems to be no end in sight there. And that is changing who is doing these leading-edge designs.
    • Yves Trudeau: We have seen in this post why the general perception is that ZFS under-performs compared to XFS or EXT4. The presence of B-trees for the files has a big impact on the amount of metadata ZFS needs to handle, especially when the recordsize is small. The metadata consists mostly of the non-leaf pages (or internal nodes) of the B-trees. When properly cached, the performance of ZFS is excellent. ZFS allows you to optimize the use of EBS volumes, both in term of IOPS and size when the instance has fast ephemeral storage devices. Using the ephemeral device of an i3.large instance for the ZFS L2ARC, ZFS outperformed XFS by 66%.
    • @JoeEmison: Instead of a collection of functions calling each other, the best serverless/serviceful applications are: (a) thick client code handling all interaction logic, (b) heavy use of services (e.g., AppSync, Cognito, @Auth0, @Algolia, @Cloudinary), and (c) small glue functions. 5/7
    • @Beaker: This just in: BGP can be hijacked. It’s like lane separator lines in roads. The only reason statistically that the majority of ppl don’t die in head-on collisions is because people follow the rules of the road. Like the Internet.
    • @weel: Translation: "I found some COBOL at a customer site. Fine. Mainframe. Nothing special. The last comment is from 1985. Written by my mother."
    • @copyconstruct: So many nuggets of wisdom in the new SRE Workbook from Google. "We keep our services reliable to keep our customers happy." Or as @mipsytipsy would put it, "Nines don't matter users aren't happy." Also, the fallacy of "100% reliability", how to set SLOs and much, much more.
    • @robertkluin: I am starting to think that "schemaless" just means your schema is scattered randomly throughout your code. It is almost impossible to troubleshoot anything non-trivial because there are endless *assumptions*, but few *explicit requirements*.
    • @NAChristakis: The boss is an ignoramus. 275BC edition: "Please call some of us in and listen to what we wish to tell you: There are lots of mistakes here as no one in charge knows about farming."  Egyptian papyrus via @BLMedieval & @petetoth
    • @asymco: App Store generates 10x Google Play Store revenue per device. - Morgan Stanley
    • CaedenV: Think about having a huge database, and how no amount of RAM can really hold it. So instead you keep the active bits of the DB in RAM, and the rest of it on the fastest storage possible. It use to be HDDs, then SSDs, then Optane, and now this. Much cheaper than RAM, and much faster than SSDs, this is what Optane/xpointe was supposed to be from day 1. Buy a server with a 'mere' 128GB of RAM, and 512GB of this optane and you can save a ton of money while taking virturally no performance hit. And with the ability to quickly back up RAM to the Optane you can get away with smaller battery backups and easier power recovery options.
    • Samuel K. Moore: With the foundry backing, Crossbar is hoping to market embedded ReRAM as a key to moving artificial intelligence systems into small or mobile devices such as surveillance cameras or drones. “AI is going to the edge,” says Dubois. “You cannot rely on the cloud for assisted driving or autonomous driving or even for a mobile phone.”
    • Frank Ferro: Memory has become a bottleneck again, When you look at processing right at the node, DDR isn’t keeping up. A number of applications that are emerging, whether that is AI or ADAS, require much higher memory bandwidth. This is where we’re starting to see new systems and memory architectures with 2.5D, HBM and GDDR6.
    • Backblaze: when the drives were in use a similar number of hours, the helium drives had a failure rate of 1.06% while the failure rate of the air-filled drives was 1.61%.
    • john37386: Many people don't understand how or why the linux kernel is the bottlebeck at speed higher than 10 Gbps. The problem is that the linux kernel can't process many simultaneous small connections. Just to be clear: - linux can easily transfer at 40 Gbps - linux chokes at around 1M packet per second per cpu socket. Thats right. So linux can easily transfer at 40 Gbps one or few simultaneous flows. But! It can't transfer several flows at 40 Gbps. The bottlebeck is the number of packets per seconds it can inspect.
    • Billy Tallis: The crossover isn't a single threshold. We're already at the point where a 120GB SSD is cheaper than a 120GB hard drive. That capacity point is on the rise, and in a few years SSDs will have displaced hard drives over the entire capacity range that makes sense for local storage in consumer machines.
    • @jim_dowling: Counter example. 5 years ago your facebook landing page was built by select ...group by. Now that same page load causes up to 9m machine learning model inferences per second.... You're welcome.
    • @mweagle: “Serverless looks set to deliver, for all intents and purposes, what PaaS should have been doing a decade ago” (via @Pocket) 
    • @PatrickMcFadin: IMO the ratio of free as in beer vs the free as in freedom ratio is heavily biased to... beer. This has been a talking point. I remember not too many years ago that a “feature” of OSS was the cost, because how could Facebook have grown to the size it was paying Oracle?
    • Peter Bright: Today, Intel announced Intel Optane DC Persistent Memory. This is a series of DDR4 memory sticks (with capacities of 128GB, 256GB, and 512GB) that use 3D XPoint instead of traditional DRAM cells. The result? The latency is a bit worse than real DDR 4, but the sticks are persistent. Although they use the standard DDR4 form factor, they'll only be supported on Intel's next-generation Xeon platform.
    • Xelas: Persistent RAM would be revolutionary for mobile devices if the power requirements are low enough. Keeping RAM refreshed isn;t a huge power draw compared to screens and CPUs, but it's an improvement. Eliminating wake-up and sleep cycles (and the power wasted to do so), having instant-on available for devices, and instant recovery if the battery dies to the previous state, etc would be great. The 10-fold density increase would gain back some circuit board space along with no longer requiring flash memory and ancillary circuitry. 
    • @dormando: I've always looked for the smallest change to an existing system to fix something; most people will swap it out or rewrite it. I've noticed the latter is better received. Easier to measure their "productivity", even if the whole bit is pointless?
    • JDB: Latency is a critical measure to determine whether our systems are running normally or not. Even though metrics can tell whether there is a latency issue, we need additional signals and tools to analyze the situation further. Being able to correlate diagnostics signals with RPC names, host identifiers and environmental metadata allows us to look at various different signals from a particular problem site.
    • @codinghorror: I think everyone underestimates the agility bonus startups get from not having to deal with the amassed technical debt from decisions made 5, 10, 15 years ago..
    • Jaron Lanier: It’s this thing that we were warned about. It’s this thing that we knew could happen. Norbert Wiener, who coined the term cybernetics, warned about it as a possibility. And despite all the warnings, and despite all of the cautions, we just walked right into it, and we created mass behavior-modification regimes out of our digital networks. We did it out of this desire to be both cool socialists and cool libertarians at the same time.
    • @CaseyLeask: Asynchronous over synchronous dependency, simple over complex, Infrastructure as code, contract over e2e tests, trust but verify, mttr > mtbf, have a single source of truth, security @ the start, monitoring, logging & alerting for every service, managed services for state&more
    • @mipsytipsy: Every experienced lead I know has some favorite axioms and patterns squirreled away for times like these.  Like: 〰️ Use boring software. 〰️ Don't try to design for more than 10x current scale. 〰️ Reuse components, especially storage systems. 〰️ You either have one source of truth, of multiple sources of lies. 〰️ If you want to rewrite a system, or rip & replace a piece of tech, it needs to be 10x better to justify the pain. 〰️ It's probably the network. ... etc.  (What are your favorites?  I collect these 🙃)
    • meredydd: Web frameworks are churn-y because they are incredibly leaky abstractions covering really awkward impedance mismatches. This means that they are never quite satisfactory - and that just to use one, you need to be capable of building a new one yourself. Think of a typical web app. Your data exists: 1. As rows in a database, accessed via SQL 2. As model objects on the server, accessed via method calls and attributes 3. As JSON, accessed via many HTTP endpoints with a limited set of verbs (GET/PUT/POST/DELETE) 4. As Javascript objects, accessed via (a different set of) method calls and attributes 5. As HTML tags, accessed via the DOM API 6. As pixels, styled by CSS. -- Each time you translate from one layer to the next, there's a nasty impedance mismatch. This, in turn, attracts "magic": ORMs (DB<->Object); Angular Resources (REST<->JS Object); templating engines (JS Object<->DOM); etc. Each of these translation layers shares two characteristics: (A) It is "magic": It abuses the semantics of one layer (eg DB model objects) in an attempt to interface with another (eg SQL). (B) It's a terribly leaky abstraction. This means that (a) every translation layer is prone to unintuitive failures, and (b) every advanced user of it needs to know enough to build one themselves. So when the impedance mismatch bites you on the ass, some fraction of users are going to flip the table, swear they could do better, and write their own. Which, of course, can't solve the underlying mismatch, and therefore won't be satisfactory...and so the cycle continues. Of these nasty transitions, 4/5 are associated with the front end, so the front end gets the rap.

  • What limits your growth? For Amazon it was shipping fees. So they got rid of them—eventually. Invisible asymptotes: People hate paying for shipping. They despise it. It may sound banal, even self-evident, but understanding that was, I'm convinced, so critical to much of how we unlocked growth at Amazon over the years. People don't just hate paying for shipping, they hate it to literally an irrational degree...Solving people's distaste for paying shipping fees became a multi-year effort at Amazon. Our next crack at this was Super Saver Shipping: if you placed an order of $25 or more of qualified items, which included mostly products in stock at Amazon, you'd receive free standard shipping...That brings us to Amazon Prime. This is a good time to point out that shipping physical goods isn't free. Again, self-evident, but it meant that modeling Amazon Prime could lead to widely diverging financial outcomes depending on what you thought it would do to the demand curveand average order composition...The rest, of course, is history. Or at least near-term history. It turns out that you can have people pre-pay for shipping through a program like Prime and they're incredibly happy to make the trade. And yes, on some orders, and for some customers, the financial trade may be a lossy one for the business, but on net, the dramatic shift in the demand curve is stunning and game-changing...Prime is a type of scale moat for Amazon because it isn't easy for other retailers to match from a sheer economic and logistical standpoint.

  • Want to know how messed up startups can get? AMP Hour with an awesome, startlingly candid interview, with Jeri Ellsworth. Jeri tells her story of the fall of good people trying build a good thing in a twisted world of warped priorities and incentives. #394 – Jeri Ellsworth and the demise of CastAR.

  • How does Pinterest generate personalized, engaging, and timely recommendations from a pool of 3+ billion items to 200+ million monthly active users? Pixie dust. Pixie: a system for recommending 3+ billion items to 200+ million users in real-time: The pruned graph is generated by from a Hadoop pipeline followed by a graph compiler. The graph compiler runs on a single terabyte-scale RAM machine and outputs a pruned graph in binary format. Graph generation is run once per day. Pixie is deployed on Amazon AWS r3.8xlarge machines with 244GB of RAM. The pruned Pinterestgraph fits into about 120GB of main memory. At 99%-ile latency Pixie takes 60ms to produce recommendations, and a single Pixie server can serve about 1,200 recommendation requests per second. The overall cluster servers around 100,000 requests per second.

  • Can the same be said for software development? Dan Falk: In a new book entitled "Lost in Math: How Beauty Leads Physics Astray," Hossenfelder argues that many physicists working today have been led astray by mathematics — seduced by equations that might be "beautiful" or "elegant" but which lack obvious connection to the real world.

  • Should I charge My Battery to 100%? For a smartphone, my answer is “I really don’t care.”  For an electric vehicle, my answer is “yes, it is better, but may be only marginally.” For energy storage batteries used by electric utilities, my answer is “yes, absolutely.”

  • Nextdoor had a bad experience with retrofitting an app to use React Native. RN works best on green field apps, when you are starting from scratch. Unfortunately there's a lack of detail. Commenters chipped in saying WalMart Labs has had better success. Pinterest had a good experience. Supporting React Native at Pinterest:  Using React Native, the initial implementation on iOS took about 10 days, including bootstrapping all the integrations into our existing infrastructure. We were then able to port the screen over to Android in two days with 100 percent shared UI code between the platforms, saving more than a week of implementation time

  • Great thread on negotiating your pay package by @tsunamino. I've seen this many times. By not negotiating you leave a lot on the table. You almost never have more power than when you're hired. Leverage it. You might be shocked at the deals you're peers have negotiated simply because they asked. How would you like your moving costs paid for? Or the first few months of your mortgage?ICOs. RCUs. More paid time off. Employee discounts. Retirement plan benefits. Day care. Health benefits. Ask. Walk away. See what happens. 

  • Airbnb shares how they're building their version of a service oriented architecture: Building Services at Airbnb, Part 1Building Services at Airbnb, Part 2 and Reconciling GraphQL and Thrift at Airbnb. Once a startup has gone through its youthful growth spurt and surly rebellious adolescence, it's time to grow up. And growing up means you want boring things like stability, order, predictability, standards. That's what Airbnb wants with their infrastructure and they are using services to get it. Services are the equivalent of the comfortably large second home in the 'burbs. Service sandards always start with an IDL. Airbnb chose a Thrift. In a smart system everything is generated from the IDL: "service-side code and RPC clients are auto-generated with Airbnb service platform’s standard instrumentations that enforces Airbnb’s infrastructure standards and practices." A request context is elaborated with useful details like user_id, locale, etc. Standard metrics are required for each service as are standard alerts: "standard service alerts include high p95 latency, high p99 latency, high error rate, and low QPS alerts at the method level for each method, and high error rate and low QPS alerts at the service level." For the API they went with GraphQL and Apollo because it's strongly typed, flexible field selection, cross platform, considerable world of client-side benefits that the Apollo ecosystem has to offer, including caching, synthesizing local state and network state, field-level analytics, and more.  

  • Nice gloss of deconstruct conf 2018, a little conference with no sponsors, a single track, no lunch, no public schedule, and no particular focus except computing.

  • Zed Shaw in a revealing Changelog interview proposes fixing bugs on open source projects as a good way to interview programmers. Programmers spend most of their time fixing bugs, so why not see if they can find and fix bugs on a real project rather than reversing a red-black tree? Also, interesting observations on the current state of open source. Big companies grind down open source developers to keep their costs low with free resources and to commoditise the complement. Companies are making billions and open source developers have to beg for money to fund their funerals. He asks a good question. Why is it OK for companies to make money using open source, but when developers choose the GPL so they can make a buck they are vilified? There seems to be conspiracy to keep open source developers from making money. There's a whole other side to open source. He got death threats from criticizing projects. He doesn't blame corporations, it's the programmers that let them. Programmers are just a bunch of servile fascists. Zed predicts people will just stop making open source.

  • Videos from GopherCon Singapore 2018 are now available.

  • 6 things I’ve learned in my first 6 months using serverless: Lesson #1 — Ditch Python; Lesson #2 — Burn the middle layer to the ground; Lesson #3 — Enjoy the Vue; Lesson #4 — Learn to love DynamoDB; Lesson #5 — Serverless Framework FTW; Lesson #6 — Authorization is the new sheriff in town.

  • You don't think of Fender (guitars) as having a software arm, but they do. They have an online digital learning product to help teach people how to play the guitarThe Cloudcast #348 - Bringing Serverless to Rock 'n Roll. They built a learning management system and a CMS on AWS Lambda. Obviates the need for an event bus middleware product. Christmas traffic spikes are handled no problem. Lessons: avoid VPC if possible, it increases cold start times; if you do use VPC implement concurrency control because there are limits at the account level; event based thinking is something you have to wrap you're head around, don't poll, fire events internally to trigger everything; for asynchronously invoked functions use the dead letter queue in case the event gets dropped; use honeycomb.io sample events logged into ELK.

  • NoOps was just the start. Is no NoBackend the future? Joe Emison is not shy with his radical transvaluation of traditional development values. Why invest in backend developers when your customers don’t care?
    • I only hire front-end developers at this point. I have been the sole DevOps/backend/middle-tier developer in the last two companies I started, and it’s been fine. That’s sort of how I limit the amount of back-end code that gets written, and force everything to be defined in configuration. We try to actively incentivize people to write as little code as possible...
    • I’d much prefer to approach serverless this way: the back end and the middle tier are undifferentiated heavy lifting. The front end is where all the business value lives, because it’s where all the client interaction is. And if we take that view, and we say let’s optimize our organizations for these great front-end customer-facing experiences, we ask: how can we spend as little time and effort and money on the back end and still have it work and scale?
    • And now, of course, we have AWS AppSync, which I think is the next generation of what Firebase and Parse were trying to do. I’m building out a tech infrastructure using AppSync now, and I’m fairly confident that, where in the past I’ve been able to run about ten front-end developers with just me doing the backend and middle-tier development part-time, with AppSync I could continue to be the CTO, do all the backend and middle-tier code, and support up to probably about a hundred front-end developers before I would need a dedicated backend developer.
    • Now you can use DynamoDB streams and Athena and ElasticSearch … you have Glue to do transformations on the data… I feel like AppSync was like putting a main piece in the middle of a puzzle. It connects a bunch of things together.

  • "Running a bootstrapped startup that serves 100 million users with three developers was unheard of just five years ago. The day has come when a bootstrapped startup can build a large-scale application like Gmail or Salesforce." How? Serverless. Scale big while staying small with serverless on GCP — the Guesswork.co story. The interesting bit is they started on AppEngine and move to Cloud Functions to lower costs. Firestore replaced both App Engine and Datastore. Cloud Functions was used to trigger price/stock alerts and synchronize data between Firestore, BigQuery and their recommendation engine. They were able to seamlessly scale from one to 20 million users on Black Friday using a small engineering team. Costs were by optimizing theri algorithm. The simpler and more intuitive the algorithm, the better it performed.

  • Using S3 Select you can reduct data xfer costs by 97.5% according to 8 New Features Recently Announced by AWS That Help Reduce Spend

  • A tale of two Lambdas — Solving Event Sourcing at GO-JEK. Found the JRuby + Sidekiq solution was slower, more expensive, and less scalable than the Clojure + AWS Lambda solution.

  • Ask HN: How is DDoS protection implemented? tptacek: I was lead developer on Arbor Network's DDoS product in the early 2000s (I left in 2005 to start Matasano Security). My information on this is surely dated, but people seem to still be using the same terminology now as then. You can break down DDoS into roughly three categories: 1. Volumetric (brute force) 2. Application (targeting specific app endpoints). 3. Protocol (exploiting protocol vulnerabilities). DDoS mitigation providers concentrate on 1 & 3. The basic idea is: attempt to characterize the malicious traffic if you can, and or divert all traffic for the target. Send the diverted traffic to a regional "scrubbing center"; dirty traffic in, clean traffic out. The scrubbing centers buy or build mitigation boxes that take large volumes of traffic in and then do heuristic checks (liveness of sender, protocol anomalies, special queueing) before passing it to the target. There's some in-line layer 7 filtering happening, and there's continuous source characterization happening to basic network layer filters back towards ingress. You can do pretty simple statistical anomaly models and get pretty far with attacker source classification, and to track targets and be selective about what things need to be diverted. A lot of major volumetric attacks are, at the network layer, pretty unsophisticated; they're things like memcached or NTP floods. When you're special-casing traffic to a particular target through a scrubbing center, it's pretty easy to strip that kind of stuff off.

  • As advertised. The Definitive Guide to Linux System Calls.

  • Awesome detective story. Code that debugs itself: Fixing a deadlock with a watchdog. Finding deadlocks is definitely a situation where you need a time machine rather than a debugger. When a process starts it's always the wild west time. You're never quite sure what's starting or in what order it will start. You never want anything to start magically. Everything should start in a specific dependency order, even infrastructure components like GAE or language environments like Python.

  • The butterfly effect is real, even in games. Though static analyzers are pretty good at finding potential unitialized variables. DETERMINISM IN LEAGUE OF LEGENDS: FIXING DIVERGENCES: Though What’s amazing is that the seeds of this divergence originated long before the assignment of the invalid y-coordinate. The divergence occurred in our network level of detail system - code we use to determine if a minion pathing update should be sent to a specific player based on the position of their camera. Talk about subtle! The fix was trivial variable initialization, but there’s little chance we would have found the core issue by brute force code inspection. Only through creative instrumentation were we able to quickly drive toward a solution.

  • On vacation I heard many stories about traditional skills going extinct because the youths would rather do things the modern way. One example is smithing with coal. There's a similar problem with languages going extinct as their last native speakers pass away. I was thinking of a sort of Disneyland where robots carried out these skills ad infinitum. That way they would never really be lost. Of course the dystopian writer in me immediately thought about how the human population would go extinct for some horrible reason and when the aliens land all they would fine are these creepy dioramas of humans past. 

  • This has been common wisdom. We'll see for how long with new products like Amazon Aurora. Scalability Tip: Move business logic out of DB: In our effort to scale, we have found that moving business logic out of the database and, instead, running the logic through code worked well. Also, Implementing HyperLogLog in Redshift and Tableau: Because our data volume is scaling much faster than we anticipated, this trade-off has recently started to degrade the load times of many of our dashboards (from a few seconds to around 20–30 seconds every time a filter is applied). We learned one of the root causes for this slow-down was metrics that included distinct counts. Since these metrics are required for unaggregated data sources, we started exploring alternatives to improve our Tableau performance. Instacart’s license of Tableau does not currently provide any such functionality. We implemented HyperLogLog, a probabilistic counting algorithm, to create data extracts in Tableau. 

  • Some people share recipes for their world famous chocolate chip cookies or a light and refreshing summer salsa. Chip Overclock not so much. He brings An Easy Home-Brew Stratum-1 GPS-Disciplined NTP Server to the potluck.

  • Having integrated analysis tools into many build tool chains, Google's experience here rings true. Programmers hate spending time just making tools shutup. "Our most important insight is that ceful developer workflow integration is key for static analysis tool adoption. We advocate for a system focused on pushing workflow integration as early as possible." Lessons from Building Static Analysis Tools at Google: Since finding bugs is easy, Google uses simple tooling to detect bug patterns. Analysis writers then tune the checks based on results from running over Google code...Most developers will not go out of their way to use static analysis tools...To ensure that most or all engineers see static-analysis warnings, analysis tools must be integrated into the workflow and enabled by default for everyone...Developer happiness is key...Engineers working on static analysis must demonstrate impact through hard data. For a static analysis project to succeed, developers must feel they benefit from and enjoy using it...Developers need to build trust in analysis tools. If a tool wastes developer time with false positives and low-priority issues, developers will lose faith and ignore results...Do not just find bugs, fix them...Focusing on fixing bugs has ensured that tools provide actionable advice30 and minimize false positives...Crowdsource analysis development...Teams like Tricorder now focus on lowering the bar to developer-contributed checks, without requiring prior static analysis experience.

  • Nice step-by-step instructions how to deploy the serverless multi-region, active-active backend inside an Amazon VPC (Virtual Private Cloud). Build a serverless multi-region, active-active backend solution — within a VPC. Why bother? Resources inside the VPC cannot be accessed from the public Internet.

  • Excellent description of WAL and replication. High availability and scalable reads in PostgreSQL: PostgreSQL already natively supports two of those requirements, higher read performance and high-availability, via a feature called streaming replication. So if your workload peaks below 50,000 inserts a second (e.g., on a setup with 8 cores and 32GB memory), then you should have no problems scaling with PostgreSQL using streaming replication.

  • Good example of using Kubernetes and the Argo Workflow Manager to extract chunks of OpenStreetMap data. Producing 200 OpenStreetMap extracts in 35 minutes using a scalable data workflow: Currently, we run about 25 tasks of 8 extracts each. With each task requesting an 8 CPU node, generating all 200 extracts in parallel takes about 35 minutes. This workflow also allows us to run multiple data pipelines together, sharing common outputs and reducing the amount of work that has to be duplicated. For instance, the updated planet file generated in the first step above is also used as the input to our Valhalla workflow.

  • A lot can change in 40 years. In 1970 the top retailers included Sears, Pennys, Kmart, Woolworth. Now? Amazon, Walmart, Costco, Home Depot, CVS, Target. Who will be the winners in 40 years? Likely companies that haven't even been created yet. @laurenthomasx3.

  • Multi-Cloud Continuous Delivery with Spinnaker: At Netflix, we’ve built and use Spinnaker as a platform for continuous integration and delivery. It’s used to deploy over 95% of Netflix infrastructure in AWS, comprised of hundreds of microservices and thousands of deployments every day. Encoded within Spinnaker are best practices for high availability, as well as integrations with Netflix tools like Chaos Monkey, ChAP Chaos Automation Platform, Archeius, Automated Canary Analysis and Titus. With Spinnaker, developers at Netflix build and manage pipelines that automate their delivery process to cloud VMs, containers, CDNs and even hardware OpenConnect devices.

  • trailofbits/echidna (article): a Haskell library designed for fuzzing/property based testing of EVM code. Currently it is quite alpha, and the API isn't guaranteed to be functional, let alone stable. It supports relatively sophisticated grammar-based fuzzing campaigns to falsify a variety of predicates.

  • Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems: This article presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations, drastically reducing the required amount of randomness while delivering a perfect load distribution.

  • The entropic brain: a theory of conscious states informed by neuroimaging research with psychedelic drug: This article proposes that a distinction can be made between two fundamentally different modes of cognition: primary and secondary consciousness. Primary consciousness is associated with unconstrained cognition and less ordered (higher-entropy) neurodynamics, whereas secondary consciousness is associated with constrained cognition and more ordered neurodynamics (i.e., that strikes an evolutionarily advantageous balance between order and disorder - that may or more not be perfectly “critical”). It is hoped that this mechanistic model will help catalyze a synthesis between psychoanalytic theory and cognitive neuroscience that can be mutually beneficial to both disciplines.

**Westworld spoilers**

In a scene Dolores is used to establish the fidelity of Bernard's most recent build because she has the most knowledge of what he was like IRL. Basically she's training him to become him again, a process that takes years.

My thought was this is a great example Adversarial machine learning. Both of them interacting together bootstrap each other into sentience. Maybe I'm reading too much into it, but it's the same idea I used in my story The Strange Trial of Ciri, so it was fun to see on the show.

I'm not sure most people would have noticed this aspect of their relationship.