« Evolution of data structures in Yandex.Metrica | Main | Have you noticed there's a lot more collaboration going on these days? Why? »

Stuff The Internet Says On Scalability For September 15th, 2017

Hey, it's HighScalability time: 


Earth received Cassini’s final signal at 7:55am ET. Let's bid a fond farewell. After a 13-year tour of duty, job well done!


If you like this sort of Stuff then please support me on Patreon.


  • 12.9 million: DynamoDB requests per second on Prime Day; 4 billion: transistors on Apple's A11 Bionic chip; 4x: extreme weather events since 1970; 51: qubit device; 50%: converted to Reason56.6 million: US cord cutters; 5000: bikes abandoned at Burning Man; 500 million: yearly visitors to Apple stores; 30 min: time to send one HD color image from Mars to Earth; 

  • Quoteable Quotes:
    • @randyshoup: Interesting idea of a *Negative* MTTR by @adrianco: notice something is going to fail and proactively fix it before it breaks!
    • @rob_pike: "The Equifax executives who let my data be stolen will probably suffer fewer consequences than I will for an overdue library book." @nytimes
    • @avantgame: on weaponized social media: "We’re in an information war with Russia. It’s time we started acting like it."
    • Jamie Dimon: It's [Bitcoin] worse than tulip bulbs. It won't end well. Someone is going to get killed
    • @manisha72617183: First they tell you that Scrum is not a magic bullet.Then they spend the rest of the time saying how it’s the best thing since sliced bread🙄
    • yogthos: My team has been using Clojure for 7 years now, and we're very happy with it. It's still a pleasure to work with, and the stability of the language has been really welcome.
    • @GossiTheDog: Another way of looking at Equifax is they did an incredible job of keeping infrastructure that size with that much legacy secure for so long
    • API Evangelist: when it comes to the shear volume, and regular drumbeat of serverless stories Microsoft is keeping pace. After watching several months of sustained storytelling, it looks like they could even pass up Amazon in the near future.
    • amelius: Well, I hear a lot of people complaining that the results on DuckDuckGo are still worse than on Google, even though both search-engines produce results within a second. And these are people that really want to quit using Google for privacy reasons. I never hear people complaining that a search is slow. So I do think that search-quality is where the competition is happening.
    • @SwiftOnSecurity: How you think multinational hypercorps get hacked: NSA 0days on the black market How multinational hypercorps get hacked: admin/admin
    • m-masa: Snapchat to me is sharing your shaky drunken escapades at 3AM with your friends to let them know you made it home and survived the night. Instagram seems more like an endless observation of copy-and-paste, superficial things and people and places. It's evolved more into a (usually inaccurate) portrayal of status than anything else.
    • Dmitri Zimine: When Serverless replaces micro-services, it is not going to be free lunch either. We are paying by introducing more complexity, now for the benefit of massive cost savings.
    • Kris De Decker: In London, a solar panel produces 65 times less energy on a heavy overcast day in December at 10 am than on a sunny day in June at noon
    • nostrademons: The real interesting work in search is in ranking functions, and this is where nobody comes close to Google. Some of this, as other commenters note, is because Google has more data than anyone else. Some of it is just because there've been more man-hours poured into it. IMHO, it's pretty doubtful that an open-source project could attract that sort of focused knowledge-work (trust me; it's pretty laborious) when Google will pay half a mil per year for skilled information-retrieval Ph.Ds.
    • rkangel: Up to now this is all classic Eve - betrayal by people you trust. The postscript is less nice though: gigx in a moment of anger asked in in game chat for real life contact details for TheJudge so that he could 'cut off his hands'. This is obviously not OK and CCP banned gigx permanently. This has the side-effect of putting the final nail in the CO2 coffin.
    • Rick Altherr: At one point I did that calculation and I was seeing one hard drive die every five minutes. 
    • EliE: the fundamental reason why ransomware is so successful, and here to stay, is that people simply don’t backup their data.
    • EliE: no matter how many times the bitcoins are moved, ultimately they must be cashed out at exchange points. So we just need to keep tracing movements until we reach a cash-out wallet.
    • @radjanirad: Just a few hours ago, Cassini received the command to turn off the RADAR instrument - for the last time. :( #cassini
    • @postwait: Most monitoring "innovations" have been mostly aesthetic, but their marketing is deafening and drowns out real innovation. #UphillBattle
    • pab: I have two years experience pair programming, and to quote asthasr, I found it an absolute slog.
    • @matthew_d_green: I have an idea. Let's combine all the hard parts of cryptography with all the asshole parts of the finance industry.
    • Pete Saia: It’s important to understand that it isn’t all or nothing. Serverless is in our future, but it isn’t our exclusive future.
    • Errata Security: The 9,000 devices were split almost evenly between Apple and Android. Almost all of the Apple devices randomized their addresses. About a third of the Android devices randomized. (This assumes Android only randomizes the final 3 bytes of the address, and that Apple randomizes all 6 bytes -- my assumption may be wrong).
    • David Rosenthal: Today's eclipse records would be on the Web, not paper or bone. Will astronomers 3200 or even only 580 years from now be able to use them?
    • Peter Zaitsev: To be competitive with non-open-source cloud deployment options, open source databases need to invest in “ease-of-use.” There is no tolerance for complexity in many development teams as we move to “ops-less” deployment models.
    • Jeremy Hsu: the advantage of the flip-flop qubit comes from inducing an electric dipole—separation of positive and negative charges—by pulling the electron a little bit away from the nucleus of the phosphorus atoms (which are themselves embedded in silicon). That electric dipole enables the spin-based silicon qubits to remain entangled together over longer distances and able to influence one another through quantum physics.
    • Cory Doctorow: All these forms of cheating treat the owner of the device as an enemy of the company that made or sold it, to be thwarted, tricked, or forced into con­ducting their affairs in the best interest of the com­pany’s shareholders. To do this, they run programs and processes that attempt to hide themselves and their nature from their owners, and proxies for their owners (like reviewers and researchers).
    • Jonathan Golden: How do you know, though, when to pull resources away from other growth initiatives to address these edge cases? My rule of thumb was when a problem was occurring at least 50 times a day, it was time to solve it more holistically. At a time when we were growing anywhere from 300%–600% per year — and edge cases were growing at least as fast — that’s when the potential explosion of problems proliferated.
    • A Mind at Play: Well, the good of this command is that if you’re in a loop you can have this command in that loop and every time it goes around the loop it will put a pulse in and you will hear a frequency equal to how long it takes to go around that loop. And then you can put another one in some bigger loop and so on. And so you’ll hear all of this coming on and you’ll hear this “boo boo boo boo boo boo,” and his concept was that you would soon learn to listen to that and know whether when it got hung up in a loop or something else or what it was doing all this time, which he’d never been able to tell before.

  • Remember all those old makeover shows on TV? They'd take take a woman, and it was almost always a woman, and give her a new look. She'd be thrilled. Finally, access to the same high priced professional used by the rich and famous. Hair styling with the current hot stylist. Check. Fix teeth, whiten and brighten. Check. Facial to make skin luminous. Check. Hide what needs hiding. Check. Clothes shopping with a famous stylist. Check. Maybe even some plastic surgery. The whole nine yards. The result was almost always a bland sameness. Even if everything was different, the women would all look alike.They'd have the same white teeth, the same styled hair, the same styled clothes. Whatever made them individuals before, washed away during the styling process. I'm afraid, in an algorithmic driven world, we'll mold ourselves to make algorithms happy, to get the reward only happy algorithms can dispense. On Amazon we see this with books. The Amazon algorithm rewards speed and momentum in writing. The more books a writer produces, the higher Amazon will rank an author and their books. That's money in the bank. The better a writer is at writing to market, the better Amazon can match writers with readers. That's money in the bank. Write a book that's hard to categorize and it will be hard to sell. Day after day, millions of algorithm driven signals will nudge us onto the same stylized path. And we'll all ooh and aah, saying how beautiful everything looks, but in the back of our minds we'll know something unique has been lost too.

  • Videos & Slides from the Kafka Summit San Francisco 2017 are now available.

  • Amazon was busy busy busy on Prime Day. The clear pitch is if we can easily handle this much load on our infrastructure, then you can too. Prime Day 2017 – Powered by AWS: Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). The team told me that they loved the elasticity of EBS, and that they were able to ramp down on capacity after Prime Day concluded instead of being stuck with it... Amazon DynamoDB requests from Alexa, the sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat...Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line... AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day...– AWS Config generated over 14 million Configuration items for AWS resources..

  • Has FaaS adoption really been slower than container adoption? 5 Interesting Findings About Serverless: FaaS adoption has been slower than containers; It’s all about the $$$; Serverless isn’t here, but it’s coming; Vendor lock-in remains a top concern; Parse may be gone but its scar tissue remains.

  • Automation replaced 800,000 workers… then created 3.5 million new jobs: A Deloitte study of automation in the U.K. found that 800,000 low-skilled jobs were eliminated as the result of AI and other automation technologies. But get this: 3.5 million new jobs were created as well, and those jobs paid on average nearly $13,000 more per year than the ones that were lost.

  • Why is Python Growing So Quickly? monkmartinez: Really easy to answer: It is stupid fun to program with Python. There are libraries for everything you can imagine and they are generally very easy to use. Once you grok the virtual environment thing, writing apps/programs/scripts is just a matter of creating a new env and installing the libs you need. Testing ideas with Jupyter notebook is fun, fast and rewarding. Pycharm is awesome. VSCode with Python just works. You can automate tons of boring stuff (Thanks Al!)... and the list goes on... and on...

  • How does Uber better accommodate rider demand during high-traffic intervals? With a a new end-to-end Bayesian neural network (BNN) architecture that more accurately forecasts time series predictions and uncertainty estimations at scale. Engineering Uncertainty Estimation in Neural Networks for Time Series Prediction at Uber: more accurately forecasts time series predictions and uncertainty estimations at scale. We also discuss how Uber has successfully applied this model to large-scale time series anomaly detection, enabling us to better accommodate rider demand during high-traffic intervals.

  • "The purpose of this series is to help people to get basic vocabulary and understand how databases are ticking from the bottom layers up to make it easier to look into their subsystems, tune, optimise and pick the right tool for the right job." On Disk IO, Part 2: More Flavours of IO. Learn all about Standard IO, Direct Memory Access, Page Cache, buffering, vectored IO, memory mapping, Page Cache Optimisations, AIO, mmap, mlock, fadvise. There's a pattern: using DMA will most likely require you to write a buffer cache, using Kernel Buffer Cache will require using fadvise and using AIO might require you to hook it up to some Futures-like interface. 

  • Can security and data privacy drive users to Azure? Introducing Azure confidential computing: Confidential computing ensures that when data is “in the clear,” which is required for efficient processing, the data is protected inside a Trusted Execution Environment (TEE - also known as an enclave), an example of which is shown in the figure below. TEEs ensure there is no way to view data or the operations inside from the outside, even with a debugger. They even ensure that only authorized code is permitted to access data. If the code is altered or tampered, the operations are denied and the environment disabled. The TEE enforces these protections throughout the execution of code within it.

  • Vertical integration reaches new heights. Apple eats other chip makers as they dip. Apple A11 Bionic Chip Has 6 Cores 4 Billion Transistors And 70% Faster Multi-Thread Workloads: offers 30% power efficiency over the A10. It also offers 50% performance boost over the chip...6 cores and is tweaked with photography in mind. The A11 Bionic faster low light autofocus, an improved pixel processor and reduces multiband noise. That’s impressive and builds on the custom ISP Cupertino introduced last year. A11 Bionic Chip in iPhone 8 and iPhone X on Par With 13-Inch MacBook Pro, Outperforms iPad Pro: on par with the chips in Apple's latest 13-inch MacBook Pro models...Poole believes the two high performance cores in the A11 are running at 2.5GHz, up from 2.34GHz in the A10. Apple iPhone X A11 Bionic 6-Core CPU Crushes All Android Challengers In Benchmark Leak: reportedly built on a 10-nanometer FinFET process...We suspect the two power efficiency cores will perform the bulk medial chores to maintain battery life, which Apple says will be 2 hours longer than the iPhone 7. But for heavy lifting, the chip is capable of not only firing up its four high performance cores, it can tap all six cores simultaneously. Combined with a burly GPU, the A11 Bionic looks like a fierce chip...In single-core performance, the phone scored 4,061 points, and nearly hit 10,000 points (9,959, to be exact) in multi-core performance...Samsung's Galaxy S8+ scored 1,845 points in single-core performance and 6,333 points in multi-core performance...That's what you call a beat-down.

  • Awesome detailed analysis of the Struts vulnerability the lead to the Equifax debacle. Using a framework is not a one time action. Installing upgrades is part of the deal. Which, of course, few of us do. Like OS updates, all this needs to be automated. An Analysis Of CVE-2017-5638. Lessons: you can’t always rely on reading CVE descriptions to understand how a vulnerability works. The reason this vulnerability was ever possible was because the file upload interceptor attempted to resolve error messages using a potentially dangerous function that evaluates can’t rely on using known attack signatures to block exploitation at the web application firewall level. For example, if a web application firewall were configured to look for OGNL in the content-type header, it would miss the additional attack vector explained in this post. The only reliable way to eliminate vulnerabilities like this one is to apply available patches, either manually or by installing updates.

  • Videos from Node Summit 2017 are available.

  • Are webhooks out for system integration? Why is Serverless Extensibility better than Webhooks?: A new approach is starting to emerge in the industry, one that we call Serverless Extensibility. Instead of placing the burden on customers, SaaS products can leverage a Serverless platform to allow users to author and execute extensions directly in the product.

  • Containerization at Pinterest. Look at the before and after architecture pictures. The container based architecture is so much simpler. The wins: developer velocity by eliminating the need to learn tools like Puppet. Provide an immutable infrastructure for better reliability. Increase our agility to upgrade our underlying infrastructure. Improve the efficiency of our infrastructure.

  • You sit all day. I know you do. Here's some advice for scalable sitting. Mastering Posture, Pain & Performance in 4 Minutes a Day with Egoscue - Brain Bradley. Is it good? Unfortunately there's little in the way of specifics, but the idea of sitting as a sport that you have to train for is at least 10x more realistic than being told don't sit so much when your job involves a lot of butt in the seat time. 

  • Ivan Pepelnjak with a great boots on the ground interview. Networking Trends Discussion with Andrew Lerner and Richard Simon. Microsoft recommends users go over the internet to connect to Office 365 instead of using a ExpressRoute (think DirectConnect). Using DNS with anycast means you can enter the MS network as close as possible to where you are. ExpressRoute works for Azure but not Office 365. Years ago people were afraid of the cloud. Now people are trusting Amazon, etc. People who worry about colocation or lock-in use colocation, like Equinix, put their data there, and use ExpressRoute and DirectConnect to connect. You can run workloads in AWS and have your storage in Equinix. This option is getting more consideration in a multi-cloud world and in the IoT world, where there multiple IoT platforms. A lot of data is being collected. Where should it sit? Datacenter or edge? Maybe the edge is just a colo site that has ExpressRoute and DirectConnect. There's a desire in enterprise to reduce the physical footprint that they manage. This can mean moving to the public cloud, or in colo, or using denser compute to maximize the use of their current space. We see a shrinking of datacenters because there are more powerful computers. Racks are denser and there's more in the cloud. There are more questions now about shrinking datacenters than expanding. Innovation in networking has come from large hyperscalers, dogged automation, ruthless standardization, scale out commodity components. They have a 25,000 to one server to administrator ratio, in the enterprise it's 100 to 1. Moving to an SRE approach would be a big culture change. The number one challenge the enterprise has is to overcome the mindset of incrementalism, that I can do 5-10% better than last year and everything will be fine. While tactically often right, incrementalism is strategically wrong. You get fired for an outage. Risk aversion stifles innovation. You don't automate. Irony is, constant incremental change means more technical debt, which ads fragility, which reduces availability. We need reward networking teams differently. There are things Google does that we can emulate. Pick the low hanging fruit in a pilot project. Shift the spend from premium products to premium people, which reduces the overall spend on IT by 25% over 5 years. Train the people you have to move up the stack. We need to get into the mentality of fail fast, fail small, get more rehearsed at trouble shooting, more windows where we make changes rather than the monthly big bang maintenance window, get into a devops mindset of blameless postmortems, learn from failures don't hide them, build smaller failure domains, it matters how we measure, don't just measure uptime, measure time to detect. Most networks engineers are ready to change. The problem is at the top.

  • Before the cloud there were many proto-clouds anticipating the future, if not quite inventing it. Here's the fascinating story of Tymshare's Tymnet as told by Ann Hardy, in an oral history interview with the Computer History Museum. Someone Else’s Computer: The Prehistory of Cloud Computing (transcript). If Tymnet hadn't been sold to McDonnell Douglas and later British Telecom, who knows?

  • Become a nerd farmer. If you'd like a more technological take on farming then Open Agriculture Initiative (OpenAg) is for you. Don't bother digging up a plot of land, build a Food Computer. Even the indoor farms are looking a bit like datacenters. Good interview on The Splendid Table.

  • What's So Bad About Posix I/O? notacoward: The author brings up a lot of good points, but they're mostly good within the context of HPC. Out in the broader universe where most data lives, many people do require the full suite of POSIX namespace, metadata, and permission semantics. Yes, even locking, no matter how many times we tell them that relying on that in a distributed system is Doing It Wrong. I know because I support their crazy antics at one of the biggest internet companies. The author's on much firmer ground when he talks about POSIX consistency semantics. While we can't control misguided applications' (ab)use of locks, we can certainly offer models that don't require serializing things that the user never wanted or expected to be serialized, or making them synchronous likewise.

  • Just a great overview of the modern computing ecosystem today. The Amp Hour #357 - An Interview with Rick Altherr [Google]. All the things are covered.

  • Maybe we're going about this whole devops thing wrong? Rather than reacting to problems maybe we should be predicting problems? A constructed devops. Lisa Barrett on How Emotions are Made. The most efficient way to run a system is not to have the system dormant, doing nothing, and then be stimulated into reacting. That's inefficient. The most efficient systems are predicting and then using input to correct those predictions. The brain is organized to predict, not react. It feels like we're reacting to events in the world, but your brain is constantly predicting what will happen next and these predictions are the basis of your emotions. Based on your situation right now networks in your brain are predicting what will happen in a moment from now. When the moment arrives it uses the sensory input to correct the predictions and that becomes your sensory experience. Learning is when your brain takes in information it didn't already predict. The brain manages your body predictively. While your brain is creating your thoughts, feelings, and perceptions, it's also managing all your systems, keeping them in balance. The sensory consequences of these changes in your body, your heartbeat, your lungs expanding, is a set of sensations your brain is predicting, just like it's predicting what you're going to see, hear, taste, and so on. We experience these sensations from our bodies as simple feelings, feeling pleasant, or unpleasant, feeling worked up, or feeling calm. Your brain is predicting these, while partially constructing them, at the same time it's predicting what you'll see, hear, taste, and so on. Together these predictions are what we refer to as concepts. Your brain is a master of deception. It's creating experiences and directing your actions with a magicians skill, never revealing to you how it does so, all the while giving you a false sense of confidence that it's products, your experiences, reveal it's inner workings.

  • Easy way to save some $$$. How to save on AWS Elastic Beanstalk EC2 machines by putting them to sleep.

  • Side Effects, Front and Center!: For example, when the hotel reservation is cancelled because I chose not to go to Europe, that probably didn't change the order for groceries. Perhaps my reservation pushed the occupancy to 200 rooms and a new level of demand for the restaurant. Most likely, the expected occupancy will need to drop to 180 or so before the hotel will fiddle with the grocery order. Repeatedly calling the grocer to schedule, then cancel, then schedule deliveries is likely to drive the grocer to remove you from its list of customers.

  • This will be a tricky multi-cloud strategy to execute, but VMware is doing the deals because public clouds want some of that enterprise sugar. VMware Soars As New Cloud Powerhouse On Deals With Amazon, Microsoft, IBM And Google

  • Like all new technologies, robot farming will start out more expensive and less efficient, then exponential development curve will kick in and the world will change. Autonomous Robots Plant, Tend, and Harvest Entire Crop of Barley: During the Hands Free Hectare project, no human set foot on the field between planting and harvest—everything was done by robots. This includes: Drilling channels in the dirt for barley seeds to be planted at specific depths and intervals with an autonomous tractor; Spraying a series of fungicides, herbicides, and fertilizers when and where necessary; Harvesting the barley with an autonomous combine...To make these decisions, robot scouts (including drones and ground robots) surveyed the field from time to time, sending back measurements and bringing back samples for humans to have a look at from the comfort of someplace warm and dry and clean...Overall, the field produced 4.5 metric tons per hectare, which is significantly less than the average of 6.8 metric tons per hectare that you could expect from conventional (human-intensive) farming methods.

  • How do you add search to your product? What every software engineer should know about search. nostrademons: Ex-Google search engineer here, now using hosted ElasticSearch extensively in my startup. This is a really good overview. If there's one part I want to highlight, it's that you should expect to spend a lot of time fine-tuning your ranking function for your particular product & corpus. The default ElasticSearch ranking function kinda sucks. It was changed in ES 5.0 to Okapi BM25, which is the current academic state-of-the-art in non-machine-learned ranking functions. However, search is one field where the current academic state-of-the-art is at least a couple decades behind where things are in industry. When you use a service with good search that just works, chances are that there's been a lot of engineer hours devoted to identifying exactly which signals are most useful in your corpus, how they relate to subjective evaluations of relevance, and how to clean them up so that noise doesn't dominate the signal.

  • Interesting discussion with Kristen Dorsey on everything MEMS. Tiny Sensor Problems. Unfortunately they aren't solving all the world's problems yet, but they do some cool things.

  • Maybe isolating software components into microservices is not the right approach? Lisa Barrett on How Emotions are Made. Forget universal facial expressions. You do not have a neural essence for each emotion that is baked into your brain from birth. You do not have a physical fingerprint every time you have an emotion you will have tendency towards a specific action, specific body state, specific facial expression. There's not a single region of the brain responsible for emotion. The truth is much more interesting than that. Variability is the norm. Emotions can't be localized to particular brain regions. Emotions can't be localized to networks. Divide the brain into 180,000 voxels. Look at the pattern of activity across the voxels for someone feeling anger, you can identify the pattern of voxel activation associated with anger, yet not predict voxels themselves. Variability is the key to robustness. Each emotion, anger, fear, etc., is a highly variable set of instances that are tied specifically to the situation you are in. Your brain is able to make not just one anger, but a whole variety of angers, each one tailored to the situation you are in. There's no fingerprint because there's more than one way to do it. Also, you can't carve the brain up into mental organs. The billions of neurons in your brain can be understand as a series of smaller networks. These networks are not independent of each other. They actually share neurons. In your brain networks overlap and share common hubs. Each network performs more than one function. The same networks involved in making emotion also make thoughts, memories, and perceptions. When you look at patterns of activity with emotion, all the networks of the brain are engaged, but the one most highly engaged are the same ones important for making concepts, for controlling your body. The exact same networks. Our brains constructs our emotional experiences using the same circuits and functions it uses to construct everything else we experience, including our perception of the world. 

  • Genius! @cblatts: This elevator has a call button 30 feet away so that the elevator is there when you arrive

  • Julia Joins Petaflop Club: Julia has joined the rarefied ranks of computing languages that have achieved peak performance exceeding one petaflop per second – the so-called ‘Petaflop Club.’ developed a new parallel computing method to process the entire SDSS dataset. Celeste is written entirely in Julia, and the Celeste team loaded an aggregate of 178 terabytes of image data to produce the most accurate catalog of 188 million astronomical objects in just 14.6 minutes with state-of-the-art point and uncertainty estimates...Celeste achieved peak performance of 1.54 petaflops using 1.3 million threads on 9,300 Knights Landing (KNL) nodes of the Cori supercomputer at NERSC – a performance improvement of 1,000x in single-threaded execution.

  • Transparent Hugepages: measuring the performance impact: Let’s take a look at how Transparent Hugepages affect a real-world application. Given a JVM application...Yes, you see it right! More than 10% of CPU cycles were spent doing the page table walking...Let’s turn the THP on...the number of TLB misses dropped by 6 times from ~130 million to ~20 million. Miss/hit rate dropped from 1% to 0.15%...We spend only 2% of CPU time walking the page table... RAM reads also dropped from 1 million to 350K.

  • You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech: In this paper, we studied the 2015 ban of two hate communities on Reddit, r/fatpeoplehate and r/CoonTown. Looking at the causal effects of the ban on both participating users and affected communities, we found that the ban served a number of useful purposes for Reddit. Users participating in the banned subreddits either left the site or (for those who remained) dramatically reduced their hate speech usage. Communities that inherited the displaced activity of these users did not suffer from an increase in hate speech.

  • ActivityPub: from decentralized to distributed social networks: a protocol being developed at the W3C for the purpose of building federated social systems. Users can use implementations of ActivityPub like Mastodon and MediaGoblin as libre alternatives to large siloed social networking systems such as Facebook, Twitter, YouTube, and Instagram.

  • Firmament: fast, centralized cluster scheduling at scale: This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at subsecond placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20× over Quincy [22], a prior centralized scheduler using the same MCMF optimization. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used centralized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6×.

Hey, just letting you know I've written a new book: A Short Explanation of the Cloud that Will Make You Feel Smarter: Tech For Mature Adults. It's pretty much exactly what the title says it is. If you've ever tried to explain the cloud to someone, but had no idea what to say, send them this book.

I've also written a novella: The Strange Trial of Ciri: The First Sentient AI. It explores the idea of how a sentient AI might arise as ripped from the headlines deep learning techniques are applied to large social networks. Anyway, I like the story. If you do too please consider giving it a review on Amazon.

Thanks for your support!

Reader Comments (6)

"on weaponized social media: "We’re in an information war with Russia. It’s time we started acting like it."

When you can't help but put your crazy neo-con/leftist ideology everywhere, even in your unrelated job.

September 15, 2017 | Unregistered CommenterPino

When you can't help but react ideologically everywhere, even when you have no idea what you are talking about.

September 15, 2017 | Registered CommenterTodd Hoff

"ActivityPub: from decentralized to distributed social networks" links to your local file url ;)

September 15, 2017 | Unregistered Commenterzbb

The ActivityPub link points to a local file. Excellent post as usual!

September 16, 2017 | Unregistered Commentermc

When you think that following a blog obligates the author to write what you want to read.

September 16, 2017 | Unregistered CommenterMichael

Whoops, sorry about that.

Here's the correct link:

September 16, 2017 | Registered CommenterTodd Hoff

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>