Stuff The Internet Says On Scalability For September 8th, 2017

Hey, it's HighScalability time:

May you live in interesting times. China games swarming drone attacks. Portable EMP anyone? (Tech in Asia)

If you like this sort of Stuff then please support me on Patreon.

  • 100GB: entire corpus of articles written at the NY Times; 80GB: data for one human genome; 3%: Linux desktop market share; 3.5M: fake Wells Fargo accounts; $18,000: world’s most expensive vacuum; 2000: Netflix recommender taste groups; 27%: year-over year-growth rate of Python on SO; 4M: Time Warner hacked; 143M: Equifax hacked; $800M: ICO funding in Q2; $257M: Filecoin ICO; 

  • Quotable Quotes:
    • Brendan Gregg: jobs are also migrating from both Solaris and Linux to cloud jobs instead, specifically AWS. The market for OS and kernel development roles is actually shrinking a little. The OS is becoming a forgotten cog in a much larger cloud-based system. The job growth is in distributed systems, cloud SRE, data science, cloud network engineering, traffic and chaos engineering, container scheduling, and other new roles. 
    • @DrQz: The Performance Paradox: The better u do ur job, the more invisible u become. https://goo.gl/1aTRvw  🐵 🙄
    • @kennwhite: $100,000+ spent on thousands of [Facebook] ads, tied to on 470 fake accounts, all linked to a propaganda troll farm with ~600 staff in St. Petersburg.Kenn White added,
    • marssaxman: 10.6.8 was the best Mac OS ever. Since then I've felt increasingly uncomfortable with the heavy-handed, paternalistic direction Apple has been taking their OS; it just doesn't feel like home anymore. I believe in personal computers as tools of personal empowerment; it's my machine, not Apple's. I really resent being told what I can and can't do with it, and I neither need nor want an itunes account.
    • @jemangs: "Amazon spent $16.1 billion on R&D last year, a figure that should strike fear into its competitors" - Recode
    • @xaprb: OH: "I have some really junior staff and they were bitching about having to wait 5 minutes for an EC2 instance. GET OFFA MY LAWN."
    • Nora Jones: Chaos doesn't cause problems, it reveals them. 
    • Littlefinger: chaos is a latter.
    • Ken Stanley: sometimes in order to make discovery possible, you have to stop having an objective
    • GeneticGenesis: Whenever a "config change" (Note: this includes adding or removing targets to a target group, EG Autoscaling) happens on an ALB, the ALB drops all active connections, and re-establishes them at once, at high load, this obviously causes significant load spikes on any underlying service.
    • Stefano Bernardi: Call me old fashioned, but wanting to raise half a billion dollars for a pre-product endeavor is absolutely f*cking insane.
    • revscat: This was my first experience with modern JavaScript frameworks and TypeScript. I wanted to do it right, so worked closely with team members who were more versed in this stuff, and followed the various recommended best practices. By the time all was said and done the PR for this thing had 27 files in it. For a modal. This seems ludicrous to me. 
    • Tony Seba: [on disruption] Technology convergence is when several technologies and business model innovations converge at one point in time to enable functionality at a certain cost.
    • Tony Seba: Business model innovation is every bit as disruptive as technological innovation.
    • Tony Seba: By 2030 95% of all passenger miles are going to be autonomous electronic vehicles. There goes the internal combustion engine industry. There goes the individual ownership of cars. We will have cars as a service just as we have movies as a service. 
    • @Noahpinion: 15 years ago, the internet was an escape from the real world. Now, the real world is an escape from the internet.
    • @jbeda: Hot take: [new AWS LB] similar to but more limited to GCP L3 LB. AWS LB is zonal and looks to do NAT. GCP L3 LB supports anycast across regions and DSR.
    • @GossiTheDog: Tip - if you want in to a bunch of factory networks, don't target the companies - target their ICS suppliers. Find names via case studies.
    • @GossiTheDog: Because vendors usually self managed black box VPN appliances at sites, the actual company doesn't see logs = doesn't know they are owned.
    • @ftrain: Giant company: We are geniuses worth a trillion dollars. Me: I would like to log into two different accounts at once. Company: Holy shit.
    • @GossiTheDog: Equifax's infrastructure is a weird mix of IBM WebSphere, Apache Struts, Java.. it's like stepping back in time a decade.
    • catvalente: “The Internet used to be full of original content & lively debate” is the new “in my day we walked to school in the snow uphill both ways”
    • Dan Wang: There are plenty of service jobs that are meant to cancel out the efforts of other service jobs. One firm spends a few million dollars to hire a dozen ad agencies to convince you to purchase its car insurance policies; another firm hires a different dozen agencies and puts out other ads. 
    • @reubenbond: Startup idea: ♦️Buy low grade microservices for cheap ♦️Bundle them together to diversify risk ♦️Split the result & sell them as A-grade
    • Lant Pritchett~ why are the scarcest economic resources—entrepreneurial ability and technical talent—going into automating an abundant resource: cheap labor. 
    • Daniel J. Boorstin: We risk being the first people in history to have been able to make their illusions so vivid, so persuasive, so ‘realistic’ that they can live in them. 
    • Silhouette: A lot of software today seems to be more like movie or sports franchises: once you've found a winning formula, you just keep cranking it out with slight variations from one year to the next. After all, as long as there are enough suckers in the market to pay your bills if you do that, what's to stop you?
    • Nassim Nicholas Taleb: If humans were immortals, they would go extinct from an accident, or from a gradual buildup of misfitness. But shorter shelf life for humans allows genetic changes to accompany the variability in the environment.
    • ksec: I think many have argued that an App is a product, and you dont upgrade your Washing Machine, Set Top Box or Routers etc, you buy a new one. Hence you buy a new App. That is all perfectly fine, but it turns out most users, ( Not developers or Geeks ) will only buy a new product when their product stop working or became OUT DATED. Since most App will continue to work for as long as it is, users have no incentive to buy a newer version. That is why developers wanted a Subscription model. It is hard to sell to its existing users pool, a rental model allow them to have a continuous revenue. With upgrade model, it is basically offering a discount for its existing users, instead of telling them to buy a New version.
    • @ProfDaveAndress: [re:Y2K] This is a very important general point: people are far too inclined to believe that a crisis averted was never a crisis at all.
    • Jordan Vogt-Roberts: Satire is not bankruptcy, you just can't declare it. 
    • @mrmoneymustache: Neat math: total cost of Hurricane Harvey ($200B) is roughly equal to cost to replace all remaining US coal power plants (200GW) with solar.
    • @msuriar: Good: script database failover Better: detect failure and automate database failover
      Better++: remove need for global write master #SREcon
    • @msuriar: Iron rule: no more snowflake servers. Nothing that needs manual handling. #SREcon
    • @copyconstruct: "Node is not the best system to build a massive server web. I would definitely use Go for that" - creator of @nodejs
    • @lizthegrey: "War rooms" are responses to old tools and patterns that we don't need any more. #SREcon
    • Kasper Kubica: So for the big companies, I truly believe these confusing websites, these websites that avoid at all costs telling you what the company actually does, are a deliberate tactic.
    • @martinkl: I love how the @nytimes uses a single-partition Kafka topic to store every single article they published since 1851!
    • oppositelock: When designing C++ code for threading, you have to keep a few things in mind, but it's not particularly harder than on other languages with thread support. You generally want to create re-entrant functions which pass all of their state as input, and return all results as output, without modifying ant kind of global state. You can run any number of these in parallel without thread safety issues, or locks.
    • ethbro: Your scalable solution is worthless without last-mile integration, and that integration is always bespoke. 
    • @QuinnyPig: #SREcon @postwait "I just want a fucking clock that works." Amen.
    • Packet Pushers: The networking world has always scaled up and we haven't figured out how to scale out like the server and the storage guys. 
    • @logic_magazine: In 1983, female computer scientists at MIT wrote a report on how sexism was pushing women out of their--historically female--field. #Tbt
    • @lizthegrey: Need automatic rollbacks, extremely tight operations. "You don't have time to file a ticket with your cloud provider." #SREcon
    • @alexismadrigal: Reading older papers on infrastructure and I'm struck by how the phrase "public works" has fallen out of use.
    • jpz: I've written a lot of multithreaded C++, I did quite a lot back starting about 1994 for around 10+ years, a lot of middle-tier application code on Windows NT in particular (back in the day when it was all 3-tier architectures - now we'd call them services) - it's totally fine if you know what you are doing. Work is usually organised in mutex-protected queues, worker threads consume from these queues, results placed a protected data structure, and receivers poll and sleep, waiting for the result. Other tricks to remember are to establish a hierarchy of mutexes 
    • joegahona: We'd beeen using AMP at the publication I work for since late Oct 2016. I finally got around to comparing AMP vs non-AMP performance in Jun 2017 and in _every_ case I could find in Search Console, our site version was outperforming the AMP counterpart on mobile, most notably in search position and conversions. AMP was causing problems in other ways (tons of external calls, which were stressing our servers), so it was a good excuse to ditch it. Eager to see what the results are after 90 days of AMP-less traffic, but so far it's a relief not having to worry about it. It's important too recognize how much extra work things like AMP, Facebook Instant Articles, Apple News, whatever Amazon dreams up next, etc. etc., dump on your development team -- the maintenance alone can swallow up countless hours.
    • T-Cell World: When the infection is under control, most of the newly activated T cells die, but some of them remain as memory cells, which stay prepared to quickly combat another infection of the same type. Whereas activation of naive T cells takes more than a week, memory T cells can respond to a secondary infection within hours.
    • A Mind at Play: to communicate is to make oneself predictable.
    • joezydeco: The problem with IoT isn't the backend. There are plenty of companies that figured out the servers, ingestion, security, dashboarding, etc. The problem is all the nodes. Customers aren't going to replace equipment with newer systems just because it has IoT capabilities, which means you're attempting to retrofit machinery with sensors and connectivity. Or else you wait until the major chain customer has refreshed every single piece of equipment in every store. Set your calendar for 7-10 years and check back in. For retrofitting, every single case is different, it's custom, and 90% of the time it's not easy. And, no, slapping a Raspberry Pi to the side of a milkshake freezer isn't the answer. Some products like Helium are closer, but an array of open-collector GPIOs isn't the answer either. The only way to win here is to be highly vertical and close to your customers not only in business knowledge but actual integration with the equipment makers. I certainly don't see GE, MS, AWS, Google or anyone else really making the commitment to that kind of stuff.
    • exelius: Amazon buys (and sells) data to/from DMPs. That data can (and often does) include a hash of your credit cards, all the e-mail addresses you go by, etc. Amazon can basically buy programmable ad inventory that says "I want to show this ad for chainsaws to kmonad" and the DMP resolves who 'kmonad' is through a variety of methods. Realistically, the opsec you would need to have to avoid this would be astronomically inconvenient. These DMPs work off statistics, so they don't need to know 100% that this browser session is probably kmonad, just 70%. Maybe you have the same IP, OS version, browser extensions, cookie sets...
    • empath75: Consider the implications of this security breach [Equifax] if it's a state actor that did it. I'm going to throw out Russia as an example, but don't take that as me accusing them of doing it. Cross reference financial information on millions of americans with data breaches from yahoo and linked in, and the social graph data that's freely available from both and you have a serious national security problem. It would be easy to search for employees with serious financial problems at any institution you wanted to target with either blackmail or further intrusions.

  • An interesting back to the future trend in podcast monetization: podcasts sponsored by a single sponsor. The Cloud Cast is sponsored by A Cloud Guru. Exponent is sponsored by MailChimp. It's like old time radio, only the ads aren't near as entertaining. I mean, who doesn't remember Jack Benny and Dennis Day in this classic Texaco ad

  • Forget that beer guy, Tim O'Reilly is the most interesting man in the world. Great interview on This Week in Startups on Tim's new book: E761: Tim O'Reilly's "WTF: What’s the Future & Why It’s Up to Us”. At a whopping $17 for the kindle version, open source it's not. The key idea is that the economy is a platform; companies like Google, Facebook, and Amazon are also platforms. Both kinds of platforms suck more value from the ecosystem than they add, killing the ecosystem, and siphoning off the money that would have went to higher paying middle class jobs. The economy today is not producing value for everyone. You see this in the history of platforms. The PC started with a burst of innovation, creating a huge opportunity for thousands of small software companies and PC makers. Then the industry became dominated by a few companies. The internet also started with a burst of energy and opportunity. Over 10-15 years it too has become dominated by a few major players. Companies become dominant and forget they are part of an ecosystem and end up strangling it. That's a general description of our economy. It used to be, after The Great Depression and WWII, the system optimized for good middle class jobs. That was the stated goal. Making sure people had good paying jobs. In the 70s and 80s that flipped. The goal became "shareholder value." Take care of the shareholders, and align management interests with the shareholders, you'll make the economy more successful...at the expense of employees. Here's where the analogy comes in with platform oriented companies like Amazon, Facebook, and Google. Rather than keep a good thing going, the economy and these companies eat their young, driven by the master algorithm that says keep growing shareholder value. So Google has to eat Yelp, and all these new categories that they once sent traffic to. They betray their former partners. This tragedy happens again and again. Why not optmize for the steady state solution? Understand that you need an ecosystem of small companies. That you are a platform for other people. If you take too much out of the platform the entire ecosystem dies. This is also what is happening in the economy at large. We should learn the lessons from platforms: do not extract too much value from the ecosystem.

  • What will happen when everyone is wearing AR glasses? Detecting cheating will be impossible. Though this really was just lazy programming. WatchKit has an API for playing haptic feedback. No need to look down at the watch. Boston Red Sox Used Apple Watches to Steal Signs Against Yankees: video showed a member of the Red Sox training staff looking at his Apple Watch in the dugout. The trainer then relayed a message to other players in the dugout, who, in turn, would signal teammates on the field about the type of pitch that was about to be thrown. 

  • Epic post from Dropbox on CPU, NIC, Memory, Hard Drive, Firmware, Drivers, CPU Affinity, PCIe, Interrupt Affinity, networking stack,Fair Queueing and Pacing, TSO autosizing and TSQ, Congestion Control, Sysctls, Tooling, Compiler Toolchain, compression TLS, and much much more. Optimizing web servers for high throughput and low latency: the Dropbox edge network is an nginx-based proxy tier designed to handle both latency-sensitive metadata transactions and high-throughput data transfers. In a system that is handling tens of gigabits per second while simultaneously processing tens of thousands latency-sensitive transactions, there are efficiency/performance optimizations throughout the proxy stack, from drivers and interrupts, through TCP/IP and kernel, to library, and application level tunings.

  • When Murat says he's getting excited about the blockchain, it might be time to take a look. Paper summary. Untangling Blockchain: A Data Processing View of Blockchain Systems: I love that Blockchains brings new constraints and requirements to the consensus problem. In Blockchains, the participants can now be Byzantine, motivated by financial gains. And it is not sufficient to limit the consensus participants to be 3 nodes or 5 nodes, which was enough for tolerating crashes and ensuring persistency of the data. In Blockchains, for reasons of attestability and tolerating colluding groups of Byzantine participants, it is preferred to keep the participants at 100s. Thus the problem becomes: How do you design byzantine tolerant consensus algorithm that scales to 100s or 1000s of participants? I love when applications push us to invent new distributed algorithms.

  • The world, she is changing, exponentially. Tony Seba: Clean Disruption - Energy & Transportation. In 13 years NYC went from everyone riding horses to everyone driving cars. That's called a disruption. In 1984 McKinsey forecasted cell phone adoption in 2000 would be 900,000. Actual was 109 million. The internet and smart phones were all missed by very smart people. Kodak in 2000 had record financial results. By 2012 they filed for bankruptcy. Experts and insiders are the ones who miss disruptive opportunities. Why do smart people miss predicting, let alone leading disruptions? Disruptions occur when technologies converge. The smart phone was created in 2007 by the convergence of battery technology, touch screens, microprocessors, and the cell network. Technologies get adopted as an s-curve. No successful technology in history has been adopted linearly. Once you hit the tipping point it disrupts the existing market and is adopted exponentially. Goes from 1-2% of the market to 80% in a snap, enabled by technology convergence. Business model innovation is every bit as disruptive as technology. Uber books more than the entire taxi industry within 8 years. Uber is a business model innovation enabled by the cloud and smart phone. They took advantage of this convergence and disintermediation. Same for Airbnb, which is actually an old business model, brokers. We've had brokers for hundreds of years, but not in this market. Over the next 13 years there will be disruptions in batteries, electric vehicles, autonomous vehicles, ride-sharing, and solar. It will happen for purely economic reasons. Batteries are becoming so cheap everything will have energy storage. Energy storage will be like data storage to computers. An EV has 100x fewer parts than gas vehicle. 10x cheaper to fuel. 10x cheaper to maintain. EV lifetime 2.5x. By 2025 every new vehicle will be electric. Every time there has been a 10x improvement in cost there has been a disruption. Gutenberg bible was 10x cheaper than manuscript. In 2020 solar + storage will be below 7 cents. Everyone will make the selfish rational decision to go solar.

  • About time. TIME SERIES DATABASE LECTURES - FALL 2017: a semester-long seminar series featuring speakers from the leading developers of time series and streaming data management systems. Each speaker will present the implementation details of their respective systems and examples of the technical challenges that they faced when working with real-world customers.

  • Here's How Uploadcare Built a Stack That Handles 350M File API Requests Per Day: Uploadcare has built an infinitely scalable infrastructure by leveraging AWS...runs on Python...build the main application with Django because of its feature completeness and large footprint within the Python ecosystem...use PostgreSQL as our database because it is considered an industry standard when it comes to clustering and scaling...Uploaded files are received by the Django app where the majority of the the heavy lifting is done by Celery...Celery handles uploading large files, retrieving files from different upload sources, storing files, and pushes files to Amazon S3. All the communications with external sources are handled by separate Amazon EC2 instances where load balancing is handled by AWS Elastic Load Balancer...use Amazon S3 for storage...EC2 upload instances, REST API, and processing layer all communicate with S3 directly...file and user data are managed with a heavily customized Django REST framework...use the micro framework Flask to handle sensitive data and OAuth communications...many processing tasks such as image enhancements, resizing, filtering, face recognition, and GIF to video conversions... for IO-bound tasks aiohttp is the one we intend to implement in production in the near future as it uses asyncio which is Python-native...Pillow-SIMD is 15 times faster than ImageMagick...For delivery, files are then pushed to Akamai CDN...for deployment we use the GitHub Pull Request Model...Along with Slack, there's also G Suite for emails, Trello for planning, HelpScout and Intercom for customer success communications...we're using Segment to send data for analyses, which, in turn, are carried out by Kissmetrics,Keen IO, Intercom, and others. We’re using Stripe for processing payments... large files (25+ MB IIRC) are uploaded in chunks in parallel. And since in this case chunks are uploaded directly to S3, we're using S3 to check that files are uploaded correctly.

  • Takeaways from SRECon17 Europe. Managing SSH Access without SSH keys using Google OAuth before getting a certificate; Capturing and Analyzing Millions of Queries without Any Overhead; implementing custom ingestors to absorb logs and eventually store them in Prometheus; git-stacktrace; BitTorrent protocol; integrating Openresty along with Zookeeper for auto-discovery;

  • By this definition programmers are engineers, but because all we as programmers have are words, whatever we build will always be incomplete. Vannevar Bush: Given the pipe wrench, produce the words for that wrench and no other; given the words, produce the wrench. That, Bush taught his students, was the beginning of engineering.

  • No, it's not a roving pod of dolphins that just watched Clockwork Orange. The DolphinAttack: translates typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. Cheap devices can whisper commands to Siri, Alexa, etc., and we'll never know. A Simple Design Flaw Makes It Astoundingly Easy To Hack Siri And Alexa. This is too big a security hole to be an oversight, so why might it exist?: some companies are already exploiting ultrasonics for their own UX, including phone-to-gadget communication. Most notably, Amazon’s Dash Button pairs with the phone at frequencies reported to be around 18kHz, and Google’s Chromecast uses ultrasonic pairing, too.

  • That smart contracts are state machines should scare the hell out of anyone familiar with the lurking bugs hidden inside complex event driven state machines. Without tech to prove these contracts correct, smart contracts aren't. Step by step towards creating a safe smart contract: lessons from a cryptocurrency lab.

  • It has begun. Driverless trucks move all iron ore at Rio Tinto's Pilbara mines, in world first: What we have done is map out our entire mine and put that into a system and the system then works out how to manoeuvre the trucks through the mine. The company is now operating 69 driverless trucks across its mines at Yandicoogina, Nammuldi and Hope Downs 4. The trucks can run 24 hours a day, 365 days a year, without a driver who needs bathroom or lunch breaks, which has industry insiders estimating each truck can save around 500 work hours a year. Mr Bennett said the technology takes away dangerous jobs while also slashing operating costs

  • With the death of Solaris, if you are moving from Solaris to Linux, Brendan Gregg has written the post for you. Solaris to Linux Migration 2017. Brendan says switching from Solaris to Linux has become much easier in the last two years. He covers ZFS, containers, performance, security, Crash Dump & Debugging, and everything else you need to know when considering making the switch. 

  • Log oriented systems are popular these days; Facebook has built their own. LogDevice: a distributed data store for logs. There are others: Apache BookKeeper, KafkaZlogHumio. martincmartin: LogDevice emphasizes high write availability. So even if we're still sorting out the details of which records made it at the end of one epoch, we'll still take writes for future epochs. We just won't release them to readers until we've made enough copies of earlier records. We do our best to ensure that only one sequencer is running at a time. We use Zookeeper to store information but the current epoch & sequencer, and whenever a new sequencer starts, it needs to talk to Zookeeper. So that helps with races where several machines want to become the sequencer for a single log at the same time. Also, when clients are looking for the sequencer for a given log, they all try the same set of machines in the same order. Essentially there's a deterministic shuffling of the list of servers, seeded with the log id. So all clients will try to talk to the same server first, and only if that server is down, or learns that another sequencer is active, will the client try a different server...Within an epoch, the sequencer is a single process on a single machine and gives out LSNs sequentially. When a sequencer dies, and a new one is started, its first job is to fix up the end of the last epoch. If it can't find any copies of a given record, it inserts a "hole plug," to store the fact that the record is lost. So, except for hole plugs (which should be very rare), the only gaps are between epochs as you say.

  • Now we know how Skynet begins. Vladimir Putin Says Whoever Leads in Artificial Intelligence Will Rule the World.

  • Looks like VMware wants to be the stack that provides a common layer of abstraction across public cloud, private cloud, hybrid cloud, on-premise in a container, on-premise in a VM, to be a unified platform that provides consistency of management, consistency of operations, for wherever a workload resides. Network Break 151. VMware announced VMware Cloud on AWS, which is VMware running on bare metal in AWS. Why use this? Opex vs capex; make easier to lift and shift while acting as an evolutionary bridge to the cloud; better hardware management.

  • Shocking! GE discovers that industrial IoT doesn't scale: GE is learning lessons that almost every industrial IoT platform I've spoken with is also learning. The industrial IoT doesn't scale horizontally. Nor can a platform provider compete at every layer...A year ago it decided to stop building out its own cloud data centers and started signing partnerships so customers could run Predix in Amazon's or Microsoft's clouds.

  • How to choose a cloud computing technology for your startup: I like comparing DigitalOcean to a boutique hotel. When using their cloud computing technologies you feel like you’re part of a family and treated like one. DigitalOcean covers everything you need as an early-stage startup, it is easy to use and provides expected convenient pricing models.

  • Excellent overview. Get to know the Actor Model

  • "It’s not everyday that you have a cluster of only 4 machines, that are probably much less powerful than my current MacBook Pro, handling POST requests writing to an Amazon S3 bucket 1 million times every minute." Handling 1 Million Requests per Minute with Golang: our goal was to be able to handle a large amount of POST requests from millions of endpoints...We have decided to utilize a common pattern when using Go channels, in order to create a 2-tier channel system, one for queuing jobs and another to control how many workers operate on the JobQueue concurrently. The idea was to parallelize the uploads to S3 to a somewhat sustainable rate, one that would not cripple the machine nor start generating connections errors from S3. So we have opted for creating a Job/Worker pattern...Note that we provide the number of maximum workers to be instantiated and be added to our pool of workers. Since we have utilized Amazon Elasticbeanstalk for this project with a dockerized Go environment, and we always try to follow the 12-factor methodology to configure our systems in production, we read these values from environment variables...Immediately after we have deployed it we saw all of our latency rates drop to insignificant numbers and our ability to handle requests surged drastically...As soon as we have deployed the new code, the number of servers dropped considerably from 100 servers to about 20 servers...After we had properly configured our cluster and the auto-scaling settings, we were able to lower it even more to only 4x EC2 c4.Large instances and the Elastic Auto-Scaling set to spawn a new instance if CPU goes above 90% for 5 minutes straight.

  • Should everything be API based? The NY Times chucked their API based system for a Kafka based pipeline. Publishing with Apache Kafka at The New York Times. Their API system worked pretty much as you might expect: "the producers of content would provide APIs for accessing that content, and also feeds you could subscribe to for notifications for new assets being published. Other back-end systems, the consumers of content, would then call those APIs to get the content they needed." What's the problem?: Since the different APIs had been developed at different times by different teams, they typically worked in drastically different ways...they all had their own, implicitly defined schemas...every system that needed access to content had to know all these different APIs and their idiosyncrasies...An additional problem was that it was difficult to get access to previously published content. So they went to a log based architecture. Why?: With the log as the source of truth, there is no longer any need for a single database that all systems have to use. Instead, every system can create its own data store (database) – its own materialized view – representing only the data it needs, in the form that is the most useful for that system...log-based architecture simplifies accessing streams of content. In a traditional data store...With the log as the source of truth, we can now do immutable deployments of stateful systems. How?: The Monolog is our new source of truth for published content. Every system that creates content, when it’s ready to be published, will write it to the Monolog, where it is appended to the end...The Monolog contains every asset published since 1851...As an example, we have a service that provides lists of content — all assets published by specific authors, everything that should go on the science section, etc. This service starts consuming the Monolog at the beginning of time, and builds up its internal representation of these lists, ready to serve on request. We have another service that just provides a list of the latest published assets. This service does not need its own permanent store: instead it just goes a few hours back in time on the log when it starts up, and begins consuming there, while maintaining a list in memory...This Publishing Pipeline runs on Google Cloud Platform/GCP...We run Kafka and ZooKeeper on GCP Compute instances. All other processes  the Gateway, all Kafka replicators, the Denormalizer application built with Kafka’s Streams API, etc. — run in containers on GKE/Kubernetes. We use gRPC/Cloud Endpoint for our APIs, and mutual SSL authentication/authorization for keeping Kafka itself secure.

  • Thoughts that make you go hmmm. Are There Optical Communication Channels in Our Brains?: All this points to a bigger conundrum. If our brains have optical communications channels, what are they for? This is a question that is ripe for blue skies speculation. One line of thought is based on the fact that photons are good carriers of quantum information. Many people have theorized that quantum processes may be behind some of the brain’s more mysterious  processes, not least of which is consciousness itself. Zarkeshian and co are clearly enamored with this idea.

  • Good, Skynet will need this. Facebook reveals it has created an AI digital map showing where EVERY human on the planet lives: Facebook has created a map showing where every single person on the planet lives, it has been revealed...We identified human-built structures, such as buildings or other infrastructure, and used those locations as a proxy for where people live....We then combined our results with existing census counts and created a population data set with 5-meter resolution for 20 countries...While recognizing structures in aerial imagery is a popular task in computer vision, scaling it to a global level came with additional difficulty...A DigitalGlobe satellite image of Naivasha, Kenya (left) and results of the Facebook analysis of the same area (right). +5...'Aside from processing billions of images, finding buildings with high fidelity in rural areas is really a needle-in-a-haystack problem: Typically, more than 99 percent of the landmass we analyze does not contain any human-made structure, and it therefore poses a challenge for the machine learning algorithms to learn from such an unbalanced data set.'...'We analyzed 20 countries, which amounts to 21.6 million square kilometers and 350 TB of imagery...'For one pass of our analysis we processed 14.6 billion images with our convolutional neural nets, typically running on thousands of servers simultaneously...'Our final data set has a spatial resolution of 5 meters and thereby improves over previous countrywide data sets by multiple orders of magnitude.'  

  • People are getting really good at sifting through large piles of data. You can't hide that way anymore. Exxon Misled the Public on Climate Change, Study SaysJournalists successfully used secure computing to expose Panama Papers

  • Don't you love those posts where a programmer bosts how they can duplicate a complex system in three notes? Someone finally called BS and backed it up. How I failed to replicate an $86 million project in 1 line of code: Could this project be done for less than $86M? Maybe. Could they use OpenALPR as a starting point? Also maybe. Would it actually reduce the cost? Who knows: it’s a complex project with complex requirements.

  • You don't want users to use your app too much. Familiarity breeds contempt. What's the difference between apps we cherish vs. regret?: On average, comparing between "Happy" and "Unhappy" amounts of usage of the same apps, their unhappy amount of time is 2.4x the amount of happy time.

  • Nice idea. You should use SSM Parameter Store over Lambda env variablestheburningmonk/lambda-config-demo

  • Good code examples. A URL Shortener Service using Go, Iris and Bolt

  • StackStorm/st2: event-driven automation commonly used for auto-remediation, security responses, facilitated troubleshooting, complex deployments, and more. Includes rules engine, workflow, 1800+ integrations (see https://exchange.stackstorm.org), native ChatOps and so forth.

  • Morgan-Stanley/hobbes: a language, embedded compiler, and runtime for efficient dynamic expression evaluation, data storage and analysis

  • solo-io/squash: The debugger for microservices.

  • Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter (article): we found the conversation around alternative narratives of mass shooting events to be largely fueled by content on alternative (as opposed to mainstream) media. Twitter users who engaged in conspiracy theorizing cited articles from dozens of alternative media domains to support their theories. Occasionally, they cited mainstream media as well, either to use details from articles about the event as evidence for their theories or to directly challenge the mainstream narrative. Many of the domains we analyzed were broadly conspiratorial in nature, hosting not one, but many different conspiracy theories. We also detected strong political agendas underlying many of these stories and the domains that hosted them, coding more than half of the alternative media sites as primarily motivated by a political agenda—with the conspiracy theories serving a secondary purpose of attracting an audience and reflecting or forwarding that agenda.

Hey, just letting you know I've written a new book: A Short Explanation of the Cloud that Will Make You Feel Smarter: Tech For Mature Adults. It's pretty much exactly what the title says it is. If you've ever tried to explain the cloud to someone, but had no idea what to say, send them this book.

I've also written a novella: The Strange Trial of Ciri: The First Sentient AI. It explores the idea of how a sentient AI might arise as ripped from the headlines deep learning techniques are applied to large social networks. Anyway, I like the story. If you do too please consider giving it a review on Amazon.

Thanks for your support!